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An Aptitude Test for Veterinary Medicine * 


William A. Owens 


lowa State College 


Particularly since the war, schools of veteri- 
nary medicine have been able to enroll for 
training only a relatively small percentage of 
their applicants. An increased emphasis upon 
the importance of identifying the best-qualified 
candidates has resulted in a careful reexamina- 
tion of selection procedures and in recognition 
of the possible utilization of some sort of 
veterinary aptitude test. 

Accordingly, the problem of the present in- 
vestigation was to discover or to develop an 
efficient predictor, or predictors, of scholastic 
success during the first professional year of 
veterinary training. 

Preliminary findings are based upon the 
records of all (N =133) freshmen and sopho- 
mores who were enrolled in the School of 
Veterinary Medicine at The Iowa State College 
during the academic year 1947-48. 

Validation findings are upon the 
academic records of 150 pre-veterinarians 
tested and subsequenily enrolled in Veterinary 
Medicine at either Cornell University (N = 25), 
Michigan State College (N=41), Kansas 
State College (N = 49), or Iowa State College 
(N=35) during the academic year 1948-49. 


based 


Method 


Indices already available were examined as 
to their predictive utility. These included: 
high school academic average, pre-veterinary 
college average, grade in certain specific pre- 
veterinary courses, scores on the ACE psycho- 
logical examination, and sub-test scores on 
Form 20 of the Moss Aptitute Test for Medical 
Professions (3) 

Since none of the above proved to be a 
highly satisfactory predictor of academi 


* The writer wishes to acknowledge the invaluable 
counsel and assistance of Dr. James E. Wert 


success, it was decided to construct four special 
purpose Two of these were 50-item 
achievement tests, constructed with the assist- 
ance of the departments concerned, and over 
the content of the two most predictive pre- 
veterinary courses—chemistry and zoology. 
The remaining two were, respectively, 60 and 
50-item aptitude tests designed to measure the 
same abilities as the most predictive pair of 
sub-tests in the Moss ATMP. The first, 
called “Paragraph Comprehension,” is a read- 
ing test; and the second, designated as “Verbal 
Memory,” involves the timed study of standard 
selections with a subsequent objective test 
upon accuracy of recall. Both are entirely 
new, and their content was judged by members 
of that staff at Iowa State to be representative 
veterinary content. The usual psychometric 
procedures relative to establishment of time 
limits, analysis of items, and general revision, 
were employed following a cross-sectional ex- 
perimental study of the four tests in 1947-48 
and prior to a longitudinal study of their 
validity in 1948-49. The and 
zoology achievement tests were combined sub- 
sequent to the 1947-48 study, since their inter- 
correlation approached their reliabilities; the 


tests. 


chemistry 


composite, shortened to 80 items, was simply 
called ‘‘Pre-veterinary Achievement.” It may 
be noted that none of these test results have 
been employed in selection and that the data 
are, in this regard, uncontaminated. 

In the validational study, the criterion 
adopted for the evaluation of veterinary apti 
tude was that of academic success during the 
first semester or first two quarters of pro 
fessional training. A total grade-point average 
for this period was employed for three reasons: 
(1) such an average is highly correlated with 
ultimate academic 


scholastic standing: (2) 
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mortality is heaviest at this time; and (3) it 
seemed to be the longest waiting interval in a 
longitudinal study compatible with the urgency 
of the case. As these data were received from 
the several cooperating schools, each subject’s 
grade-point average was assigned a standard 
score value in.a distribution for the institution 
from which it had been obtained. 

Statistical analysis of these data follows the 
usual correlational form with the exception 
that a discriminant function style of analysis 
has been employed in obtaining estimates of 
the weights to be assigned to each ofthe several 
new tests in order to produce composite scores 
which maximize the differences between certain 
performance groups. In addition, multiple 
biserial correlations have been employed at 
one point, after a method described by Wert 
(4), to estimate the combined effects of the 
tests in predicting the dichotomous criterion 
of performance, 


Results 

The primary results of this investigation 
have been summarized in four tables. Prior 
to some systematic comment upon them, it 
seems appropriate to observe that the “cor- 
rected”’ odd-even reliability of the composite 
or total test score derived from the several 
weighted sub-test scores 0.88. This 
estimate is based upon results obtained from 
the entire tested population of 424 candidates 
for admission into veterinary training at the 
previously named cooperating institutions. 
Total test reliability thus seems reasonably 
satisfactory.! 

To proceed, in Table 1 are shown the correla- 
tions between various predictors and freshman 
veterinary average. Under “Existing Indices” 
it is interesting to note that pre-veterinary 
chemistry average appears to be the best single 
predictor. A possible artifact involved is the 
relatively large variance in chemistry grades 
as compared with that typical of other pre 
veterinary subjects. 

Sub-tests three and six of the Moss ATMP 
suggested functions to be measured by the two 
new aptitude tests, and the former may be 
broadly considered as models for the latter. 


was 


‘As given with unlimited time in the preliminary 
study, the reliabilities for the sub-tests are Pre-veteri 
nary Achievement, 0.65; Paragraph Comprehension, 
0.78; and Verbal Memory, 0.74 


A. Owens 


Table 1 


Correlations of Various Predictors with Freshman 
Veterinary Average 


Preliminary Study, N = 133 


\ aniatme 


Existing Indices 
Total pre-veterinary average 
Pre-veterinary chemistry average 
Pre-veterinary zoology average 
Raw score on ACE 

Moss ATMP Subtests 
Visual Memory 
Memory for Content 
Comprehension and Retention 
General Information 
Vocabulary 
Understanding of Printed Material 
Application of Principles 
Logical Reasoning 

Four New Tests 
Chemistry Achievement Test 
Zoology Achievement Test 
Paragraph Comprehension 
Verbal Memory 

Sum of Paragraph Comprehension and Verba! 

Memory 

* At 5 per cent level r = .17; at 1 per cent r = .22 
As previously indicated, the “Four New 

Tests” were reduced to three following the 


evidence derived from the preliminary study 
and thru the combination of the chemistry and 
zoology achievement tests to form the revised 


Pre-veterinary Achievement Test. In spite 
of its apparently poor validity, this content was 
tentatively retained because it was recognized 
that the selection process at Iowa State, in- 
volving heavy emphasis upon pre-veterinary 
success in chemistry and zoology, brought 
about a tremendous restriction in range of 
talent on these tests which might not be 
duplicated at other institutions. In the event 
that it were not, it was felt that they could 
conceivably be of substantial value. 

From Table 2, then, it is apparent that each 
of the three new tests much better predicts 
academic inability from a relatively low cutting 
score than academic standing from the total 
range of scores (this admittedly within the 
successful group). As evidence, the product 
moment correlations in the left-hand column 
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Table 2 


Correlations of Test Scores with Grade-Point Average (Validational Study, N = 150) 


P.-M 


Correlation* 


Tetrachoric r 
G.P.A.——4 vs. } 
0.24 
0.36 (0.47 
O17 


OAl (, 
0.56 | 
0.26 { 


Pre-veterinary Achievement (X,) 
Paragraph Comprehension (X2 
Verbal Memory (X;) 


, 


*5%r = 0.16 and 1%r = 0.21 
rable 3 
Multiple Biserial Correlations and Discriminant Test Weights (Validational Study, N = 150) 


Multiple Biserial r’s 
(G.P.A.—} vs. } 


Weights in the 
Discriminant Function 


0.37 
0.43 
0.40 
0.45 ( 


X, and X; (all students) 
X, and X; (within schools) 
X,, X. and X; (all students) 


2.05 Xo + Xz, 


0.58 X; + 2.13. X2+ Xs 


X:, X: and X; (within schools 


are consistently and substantially smaller than 
the tetrachoric correlations’ in the right-hand 
column. These latter were obtained by arbi- 
trarily dichotomizing grade-point averages at 
their mean, and by then dichotomizing test 
score distributions at the points of minimum 
error in classification, at or slightly above the 
lowest quartile point. The percentage to the 
right of each coefficient is that below the cut- 
ting score in the truncated tail of the test 
distribution in question. 

Since it appeared to be most efficient in each 
case to set cutting scores on the tests near the 
25th centile, it was arbitrarily decided to 
break the distribution of grade-point averages 
at the same level and to attempt to determine 
how best to weight each test to maximize the 
differences between these two segments of the 
criterion. 

Thus, in Table 3, are shown the multiple 
biserial correlations and discriminant-function 
weights based upon this proposed dichotomy 
in the criterion. It may be noted that the 
correlations for “all students” are considerably 
smaller than are those “within schools.” This 
is no doubt attributable to test-wise institu- 
tional differences so large that an individual’s 


‘ 


? Magnitudes estimated from the computing diagrams 
of Chesire et a 


scores might rank him in the highest quarter 
at school A and in the lowest at school B 
The second series of discriminant function 
weights in Column 2 are those which were 
employed to derive a single series of composite 
or total scores, the “V scores,’”’ for all tests 

Table 4 is a summary table. In it appear 
the tetrachoric correlations between the di 
chotomized test and the 
dichotomized criterion. The per cent to the 
right of each coefficient, again, indicates the 
proportion of cases below the minimum error 
cutting score in the tail of the test distribution. 
Final results were cast in this form to make 
them coincident with the form of the practical 
question as it always arises, is a given subject 


composite scores 


above or below some critical test score, and 
how does this argue for his being above or 
below some accepted grade standard? 

It is to be regretted that the number of 
cases for each institution is not larger, although 
application of the chi-square test to these data 
suggests that the least significant tabled rela 
tionship surpasses the 2% probability level. 
A sort of increase in numbers may, of course, 
be achieved by ignoring institutional differences 
and the This, naturally, 
lowers the apparent degree of relationship; in 


combining data 


this instance it results in a single r, of 0.52. 





William A. Owens 


Table 4 


Tetrachoric Correlations Between Composite Test Scores and Grade-Point Average (Validational Stud) 


» 


Kans 


State 


lowa 
State 


Mich 


state 


Coll Coll 
N = 35 N = 49 





High } vs. Low } J 0.70 (12%) 


(Criterion) 


Interpretation 


In interpreting these results several con 
flicting influences must be recognized. The 
estimates of test-criterion relationship provided 
in Table 4 may be thought of as overestimates 
for at least two reasons. First, they are 
based, after the fact, upon the most efficient 
test cutting score; whereas, in practice it may 
be impractical to find or to employ such a 
value, which will in any case show sampling 
fluctuations. Second, the scoring weights 
established for the combination of sub-test 
scores were derived from a composite of four 
populations and then applied to this same com- 
posite population. A “shrinkage” in discrim- 
inative efficiency must be expected when these 
weights are applied to the scores obtained from 
a new population, although the customary 
effect may be minimized in this instance since 
the original sampling was of a broader and 
more heterogeneous group than could have 
been obtained at a single school. 

Running counter to these two influences, 
which would tend to make the values of Table 
4 appear to be overestimates, is the undoubted 
fact that the coefficients shown have been de 
pressed by a marked restriction in the range 
of talent existing within the validational group 
At some institutions, the standard deviation of 
composite test scores is 25 to 30 per cent larger 
in the distribution of ‘candidates for admission 
than in the distribution of selectees. This is a 
fact mainly attributable to current selection 
on the basis of pre-veterinary grade-point aver 
age—a test-correlated variable. If Kelly’s (2 
correction for homogeneity were applied to the 
present 
criterion correlations within the population of 


data to obtain estimates of test 


candidates, many of the relationships here re 


ported would be substantially increased i 


18%) 


magnitude. For example, in Table 2, the 
Paragraph Comprehension test would correlate 
0.47 with the criterion instead of 0.36; and, in 
Table 3, the final multiple correlation would be 
0.57 instead of 0.45 

In addition, there is evidence to suggest that 
within the groups admitted to veterinary 
training, those who had taken the tests, and 
who composed slightly more than half of the 
total number, were superior in performan € to 
those who had not taken the tests. At least 
a partial explanation is that, almost without 
tested had had their pre 
veterinary training at one of the four coopera- 
ting institutions. By and large students with 
this background make better grades and fail 
less frequently. 


exception, those 


In evaluating these conflicting influences the 
writer is disposed to judge it as not unlikely 
that they approximately cancel and offset 
each other, and that the estimates of relation 
ship reported are, therefore, not grossly in error 


Summary 


A veterinary aptitude test has been devised 
which had the following characteristics in the 
populations studied: 


1) It had a reliability of 0.88 
2) It had tetrachoric validity coefficients 
of from 0.48 to 0.72, 


average criterion 


against a grade-point 


(3) It was a better predictor of the specified 


criterion than were pre-veterinary grades, 
singly or collectively, or the ACE 


(4) It predicted most efficiently, within the 


validational group, from a relatively low 


( utting score 
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Minnesota Psycho-Analogies Test * 


Abraham S. Levine 


Human Resources Research Center, Lackland Air Force Base, San Antonio, Texas 


This project was undertaken with the general 
objective of developing an evaluation instru 
ment for psychology students. Since it was 
hypothesized that achievement in advanced 
psychology courses is to a large extent a func- 
tion of a complex of general academic ability 
and previous psychological background, a test 
was developed to provide a composite measure 
of these factors. This test comprises items of 
the four-alternative multiple choice variety in 
analogy form. The first part of each item 
contains general vocabulary and information. 
The latter part of these items consists of a 
broad sampling of psychological terms, con- 
cepts and expressions which attempt to sample 
as widely as possible the content of all the 
major fields of psychology. The response 
alternatives are all psychological in content. 
Thus the first two terms of the analogy are of 
a general nature, usually non-psychological in 
content; while the third and fourth terms are 
psychological in character. An example of 
this type of item, with the correct response in 
italics, is as follows: 


Orchestra: Violinist: : Test: (1. Battery, 2. Item 
Analysis, 3. Ztem, 4. Validity) 


It was believed that a special analogies test 
of the type described would serve the dual 
function of selection and general evaluation, 
depending upon the point in a student’s career 
at which it was administered. More specifi 
cally, it was anticipated that the test would 
serve some or all of the following purposes: 


1. Selection of students for certain advanced 
courses or for certain sections of advanced 
courses in psychology. 


Selection of graduate majors in psy 


( hology. 


* This article is based on the writer’s Ph.D, 
done under the direction of Prof. Donald G 
and entitled “A Psycho-Analogies Test as an Evaluation 
Instrument for Psychology completed, in 
December, 1949, and on file in the University of Minne 
sota Library 


thesis 


ray 
Atcrsor 


Students,’ 


Selection of applicants for special training 
programs such as the Veterans Adminis- 
tration Clinical Psychology Program 
General evaluation of professional fitness 
of a student completing requirements for 
a degree. 

Measure of growth of student by utiliza- 
tion of two equivalent forms of the test 
at different stages in his training 


Preliminary Editions of the Test 


lhe first edition of the test, called “Psycho- 
logical Analogies,’ was exploratory in nature. 
It consisted of 100 items. This test correlated 
.60 with combined midquarter and final ex- 
amination scores for 92 students in a Senior 
College class in Vocational and Occupational 
Psychology at the University of Minnesota 

The second or preliminary edition of the 
test was given the title “Psycho-Analogies.” 
It consisted of a total of 232 items, divided 
into two forms of 116 items each. While some 
of these items represented the more discrim 
inating items of the original test, or modifica- 
tions of these, the bulk of them were still new 
and untried. These new items were on the 
whole more carefully constructed, and greater 
attention was given to balancing subject matter 
The test was administered to several 
sections of Vocational and Occupational Psy 
chology and Individual Differences at the Uni- 
versity of Minnesota. 


content. 


These two courses are 
representative of intermediate and advanced 
courses in psychology in which juniors, seniors 
A total 
of 161 cases was obtained on Form A and 156 
cases on Form B. 


and graduate students are enrolled. 


Correlation of a single form 
with combined midquarter and final objective 
examinations in the various sections ranged 
from .51 to .74. 


gested that Psycho-Analogies was a slightly 


The available data also sug- 


better predictor of course achievement than 
either a general analogies test (Miller Analogies, 
Form A or B) or a specially designed achieve- 
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ment pretest. Moreover, a tendency was 
noted for the correlations between Psycho- 
Analogies and course examination scores to be 
higher in the more advanced sections. 

In order to meet the need of a brief pretest 
for purposes of screening or sectioning certain 
psychology courses, a short form of Psycho- 
Analogies was developed, consisting of the 50 
most discriminating of the 232 items. In those 
classes in which the test was not used as a 
basis for sectioning, so that the whole range of 
talent was available for purposes of validation, 
the correlations between the short form and 
combined midquarter and final examinations 
ranged from .41 to .68. 


Minnesota Psycho-Analogies 


the 
consisted of 


The final edition of Minnesota 
Psycho-Analogies, 150 items 
divided into two forms, A and B, of 75 items 
each. In the selection of these 150 items, item 
analysis data for internal consistency and 
difficulty of the preliminary forms were utilized. 

In constructing the two forms a strong en- 
deavor was made to balance them with regard 
to both difficulty and subject matter content, 
in order to obtain two equivalent forms. Four 
additional sample problems were included in 
both forms, making a total of eight items in 
each of the fore-exercises. This was done in 
order to minimize possible practice effect. In 
preceding editions of the test, a liberal time 
limit was allowed permitting almost all students 
to finish the test. No time limit was imposed 
on Minnesota Psycho-Analogies, in order to 
avoid even more stringently whatever negative 
influences the time factor may have on the 
The responses 


test, 


scores of an analogy-type test. 
to either form are indicated on a standard IBM 
answer sheet which may be machine scored. 

One limitation should be pointed out: the 
population sampled for Psycho-Analogies dif- 
fered in certain characteristics from the popula- 
tion for which Minnesota Psycho-Analogies 
will be used. Practical considerations deter- 
mining the number and availability of sub- 
jects made it impossible to obtain the most 
appropriate group for the purpose of initial 
selection of items. However, the item diffi- 
culty was somewhat higher than 50 per cent, 
if appropriate correction is made for chance 


wl 


success. Since the instrument would eventu- 


ally be used mostly on graduate students, who, 


as a group, are superior to the original sample, 
it was considered desirable to have the average 
item above the 50 per cent difficulty level. 

In administering these forms the major con- 
cern was to obtain adequate normative data for 
graduating seniors and various levels of gradu- 
students. A was 
to obtain additional course prediction data 


ate secondary objective 
wherever possible. 

Both forms of Minnesota Psycho-Analogies 
were administered to 33 graduating seniors 
and 125 graduate students majoring in psy- 
chology at the University of Minnesota. The 
graduating senior sample consisted of volun 
teers and represents only 25% of the total 
available population, whereas the graduate 
students were required to take the test and 
93% of them complied. 
bility, therefore, that the graduating seniors 
tested represented a positively biased sample 
from their population. Half of the total group 
took Form A first and the other half took 
Form B first, thereby producing a balanced 
experimental design to permit estimation of 
whatever practice effect may from 
taking one form first. In addition to obtaining 
these normative data, Form A’ 
istered to one section of Vocational and Oc- 
cupational Psychology and a section of Intro 
ductory Laboratory Psychology in order to 
obtain additional course prediction data. 

Table 1 presents the means and standard 
deviations of all groups who took Form A in 
the spring and summer of 1949. At this time 
only 50 of the graduate students had been 
tested. Almost the entire range from a chance 
score of 17.5 to the maximum score of 75 is 
utilized. Despite some overlap, which should 
be expected, the upper part of the range 


There is some possi- 


accrue 


was admin- 


Table 1 


Means and Standard Deviations of Groups Who Took 
Form A of Minnesota Psycho-Analogies 


Grou N 


Graduate Student Majors Th) 
Graduating Senior Majors 33 
Psychology 130 39 


Psychology 5 23 
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Table 2 


Means and Standard Deviations on Form A, Form B, and Forms A and B Combined, of Minnesota 
Psycho-Analogies for Four Groups of Students Majoring in Psychology 


Group 





Graduating Seniors 

First Year Graduate Students 
Second Year Graduate Students 
Third Year Graduate Students 


is utilized by graduate student majors 


This is followed in order by graduating senior 
majors, Vocational and Occupational Psy- 
chology, and Introductory Laboratory Psy- 
chology. It should be pointed out that 
Vocational and Occupational Psychology was 
a course open to juniors, seniors and graduate 
students who had at least nine quarter credits 
in psychology but who were not necessarily 
psychology majors; and Introductory Labora- 
tory Psychology was an elementary sophomore 


x 
Grad. Seniors 
(N = 33) 


0 123456789 


3» 


First Year 
Grad. Students 


N = 50) O 


Second Year 
Grad. Students 
; 4 


= 


Third Year 
Grad. Students 


N 31) 


= 0 1 $4567 


89 
40 
Distr Lol 


scores on 


majoring in psychology 


Form A 


M 


51 
57 


61.6 
66.6 


3456789 


123456789 


Forms A and B 


Form B Combined 


$.D M S.D. 


7.15 
6.67 
6.25 


5.16 


M 
7.37 
6.91 
6.46 
4.25 


50 
57.3 
60.0 


65.7 


2 101.9 
114.9 
121.6 
132.2 


‘ 
/ 


course. The graduate student sample was 
made up for the most part of graduate students 
who were more advanced on the average than 
the more complete sample reported in Table 2. 

Results on Forms A and B separately and 
combined are presented in Table 2. The data 
for the total sample of 125 graduate students 
appear in this table. These graduate majors 
are broken down into first, second and third 
year groups depending on how much graduate 
work they had taken previously. The data 


x 


l 
50 


1 


50 


234567 


Form A of Minnesota Psycho-Analogies for fc 


Each x 


= one student 





Minnesota Psycho-Analogies 


indicate that the variability of scores decreases 
as central tendency increases in successive 
year groups; and that perhaps Minnesota 
Psycho-Analogies is most useful at the graduat- 
ing senior and first year graduate student 
levels, and least useful for advanced Ph.D. 
candidates. These general trends are illus- 
trated in Figure 1, which presents the distribu- 
tion of scores on Form A. The critical ratios 
of the differences between means of successive 
groups were computed for Forms A and B 
combined. All of these critical ratios were 
significant beyond the 1% level, indicating 
that statistically significant discriminations are 
made between successive levels of psychology 
majors. 

Norms were developed for the four categories 
of psychology majors for each form separately 
and for both forms combined. These norms 
are presented in the manual designed to ac- 
company Minnesota Psycho-Analogies. Since 
courses of instruction and selection standards 
vary at different institutions, norms should be 
developed for each institution separately be- 
fore using Minnesota Psycho-Analogies as a 
service instrument. 

Available data tend largely to support the 


assumption that Forms A and B are equivalent 
for samples of graduating seniors and graduate 


students. However, there is some suggestion 
that Form B is slightly more difficult, but the 
difference is only one raw score point between 
means. Since these forms were constructed 
largely on the basis of item analysis data of a 
sample selected from a more heterogeneous 
population, equivalence will eventually have 
to be determined separately for different popu- 
lations. At any rate, the possible difference in 
difficulty even for the particular populations 
represented by the samples studied is of a 
rather small magnitude from a practical point 
of view. 

Correlation between Forms A and B for the 
original sample of graduating seniors and 
graduate students was .89. For the total 
sample of graduate students alone the relia- 
bility estimate dropped to .78. Using the 
Spearman-Brown formula, the reliability esti- 
mate for the total test of 150 items for graduate 
students alone is .88. There is some indication 
in the above coefficients that for a restricted 
range of talent such as graduate students, 


303 


both forms of the test should be combined in 
order to provide a stable enough score for 
individual prediction or diagnosis. More prac- 
tical indicators of the stability of individual 
scores are obtained from the standard errors of 
measurement which are as follows: Form 
A=3.3, Form B=3.2, Forms A and B com- 
bined = 4.6. 

Since the difference in means between the 
form administered first and the form adminis- 
tered second was not significant even at the 5% 
level, it may be concluded that there is no 
practice effect from taking one form first when 
there is no time limit for taking either form. 

The rather appreciable correlation of .72 was 
obtained between Miller Analogies, Form G, 
and the complete Minnesota Psycho-Analogies 
for the 96 graduate students on whom these 
test data were available. The magnitude of 
this correlation is partially attributable to the 
fact that the more advanced graduate stu- 
dents, because of the nature of the selection in- 
volved, tend as a group to be more “Miller- 
bright” as well as having more training in 
psychology. 

For the same sample of 96 graduate students, 
the mean raw score of 78.9 on Miller Analogies 
is about as relatively high in terms of the total 
possible score of 100 as the mean raw score of 
121.1 is on the total 150 items of Minnesota 
Psycho-Analogies. This comparison is de- 
fensible since both tests have four alternatives 
and the number right constitutes the score in 
both cases. This fact is pointed out since 
Miller Analogies, Form G, was developed 
especially for graduate students, and insofar as 
the principal use of Minnesota Psycho- 
Analogies will also be for graduate students, 
one of the main considerations is to have ade- 
quate ceiling for the upper ranges. Because 
items were selected for Minnesota Psycho- 
Analogies on the basis of an analysis of a lower 
mean level sample, there is some question as to 
adequacy of ceiling at the upper levels. It is 
interesting to note that the mean Miller score 
of 78.9 is almost at the 80th centile of graduate 
students in general. So it is also conceivable 
that the group of graduate students in the 
University of Minnesota’s Department of 
Psychology is somewhat above the mean of 
other graduate departments in Miller ability 
as well as in the kinds of special training which 
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Minnesota Psycho-Analogies is meas: .rir ir, and 
that therefore the norms develo,ed here may 
be rather high for graduate departments in 
general 

Also significant is the comparison of standard 
deviations on both tests for the sample being 
considered. The standard deviation for Miller 
G is 8.98 and for the Psycho-Analogies, Forms 
A and B combined, is 12.59. Again, in view of 
the relative number of items, both tests may 
be regarded as having approximately equal 
variabilities for graduate students in psychol- 
ogy at the University of Minnesota. It is to 
be expected then that in those other depart- 
ments of psychology where the variability on 
the Miller is greater, the spread of Minnesota 
Psycho-Analogies scores will also be greater 
particularly in view of the possibility that the 
special informational content of Minnesota 
Psycho-Analogies may be unduly influenced by 
the subject matter content of courses in the 
Department at the University of Minnesota. 

Also similar are the shapes of the distribu- 
tions of the two tests. Both Miller and Min- 
nesota Psycho-Analogies test scores are nega 
tively skewed. The form of these distributions 
may reflect to some extent the kind of selection 


that has taken place, and it is a plausible 
hypothesis that if graduate students in psy- 
chology at the University of Minnesota were 
less rigorously selected, then both distributions 


would assume a more normal form. 

A correlation of .69 was obtained between 
combined examination scores for Introductory 
Laboratory Psychology and scores on Form A 
of Minnesota For Psy- 
chology 130 an r of only .30 was obtained. The 
latter r is significantly lower than hitherto 
obtained on similar samples with preceding 
The lower r may be partially 


Psycho-Analogies. 


forms of the test. 
attributed to the more restricted range of the 
criterion, since only a two-hour final examina- 
tion was given instead of the usual one-hour 
mid-quarter and two-hour final. Upon query- 
ing the instructor for this course, another more 
interesting explanation emerged. It seems 
that in selecting items for the final examination, 
an attempt was made to weed out items which 
appeared to be saturated with either general 
ability or previous background, since less time 
was available for testing than is ordinarily 
the case. 


Abraham S. Levine 


Conclusions 


It may be concluded on the basis of the data 
obtained in this project that a special analogies 
test is a useful predictor of achievement in 
psychology courses, particularly at the more 
advanced levels. That the special analogies 
test also functions as a good terminal evalua- 
tion instrument is indicated by the rise in 
mean scores with increased amounts of course 
work and with higher levels of attainment in 
psychology. 

As a selection instrument, Minnesota Psycho 
Analogies may be conceived of as a supplement 
to Miller Analogies. Thus Miller Analogies 
may be used as one of the criteria for admission 
to graduate work, and Minnesota Psycho- 
Analogies may be employed to further deter- 
mine competency to undertake advanced work 
in psychology. As such, both instruments 
will be used as successive hurdles, and the score 
on Minnesota Psycho-Analogies may be better 
interpreted in the light of Miller Analogies 
ability. Whether or not any student 
attains a high score on Miller Analogies but 


who 


who gets a low score on the special analogies 
should be excluded from a particular depart- 
ment 
facilities at the time and the nature of the other 
relevant data on the individual. 
a student who gets a high score on Minnesota 


would be a function of the available 


Theoretically, 


Psycho-Analogies should also do well on Miller 
Analogies. If this is not the case, it may be 
due to the defects inherent in analogy tests 
under timed conditions for certain kinds of indi- 
viduals, and it should provide the basis for re- 
administration of Miller Analogies, as is some- 
times done. Minnesota Psycho-Analogies has 
incorporated certain administrative advantages 
Thus, 
for example, there are two forms of the in- 
strument which make certain kinds of cheat 
ing more difficult. 


based on experience with the Miller. 


Also, when it is necessary 
to re-administer the test another form may 
be given, thereby obviating specific memory 
The untimed nature of the test tends 
also to reduce some of the sources of con- 


factors. 


tamination. 
The basic rationale underlying the psycho- 
logical analogies test may be extended to other 


fields where a similar need exists. Thus it 
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would be possible to develop similar instru- 


ments for biology, sociology, political science, 
economics, and so on. A more useful 
application in terms of the magnitude of the 
selection task would be in the realm of medical 


aptitude tests 


still 


A special analogies test de- 
signed to measure achievement in biology, 
chemistry, and physics, as well as general 


ability may be developed for medical school 
Or it may be found more 
feasible to utilize such a test as a subtest in a 
more diversified instrument. At any rate, the 
data obtained in this project would tend to 
indicate the feasibility of exploring the possible 


selection purposes 


uses of special analogies tests in other fields 


Received December 20, 1949. 








A Note on Norms for the Purdue Industrial Mathematics Test 
and the Adaptability Test 


Howard E. Page 
CNATra Staff, NAS, Pensacola 


The Management Engineering Division of 
the Naval Air Station, Pensacola, Florida is 
faced, from time to time, with the task 
of selecting from a large number of aviation 
tradesmen a few individuals for upgrading 
into positions of Planner and Estimator and 
Shop Planner. In the past, such selection has 
been made in terms of past experience in one 
of the aviation trades and successful and pro- 
gressive experience at the Journeyman level. 

Recently, personnel responsible for such 
selection have become interested in the use of 
psychological tests as an aid in such selection. 
Since no professionally trained psychologist is 
available on their staff, the writer has served 
in a consulting capacity on several occasions. 

A major difficulty has been the non-availabil- 
ity of adequately standardized trade tests with 
normative data applicable to the aviation trades. 
Lack of personnel and time has precluded the 
development of “tailor made” tests with the 
result that commercially available tests have 
been used and norms established for the local 
population. 

A recent administration of two tests—the 
Purdue Industrial Mathematics Test and the 


Adaptability Test by Tiffin and Lawshe 
provided data on a population of 152 Aviation 
Tradesmen with a sufficiently large representa- 
tion from two trades to make feasible the pub- 
lication of such norms. 

Table 1 summarizes the background data 
on this population. It is to be noted that 
48 per cent of the total population have served 
an apprenticeship in a trade, and that 71 per 
cent have completed Trade School training. 
Fifty-seven per cent of the population has 
completed at least a high school education. 
This is a larger percentage than might have 
been expected. On the other hand, the group 
is a relatively young group in terms of age and 
in number of years on the job. The range for 
years on the job was from two months to 22 
years. It must be concluded that personnel 
included in this group are well trained and 
experienced in their particular aviation trade. 

Table 2 presents the mean scores and stand- 
ard deviations for those trades where N was 
large enough to make this meaningful. ASTP 
trainees do somewhat better on the Industrial 
Mathematics test and the Vocational Trade 


School students do less well than does the 


Table 1 


Descriptive Data for Population Tested 


No. Who 
Served 
Apprentice 


ship 


Aviation Mech. Gen. 9 
Metalsmith 4 21 
Machinist 15 
Electrician 13 
Aircraft Engine Mech 7 
Instrument Mechanic 1 
Radio Mechanic 1 
Electroplater : 0 
Unclassified 1 


Total . 68 


No. Who No. Having 
Had Trade at Least No. Yrs. Average 
Sch. Trng HS Educa on Job Age 
34 32 : 34 
32 26 35 
10 37 
il 38 
11 5 37 
4 : 33 
3 7 39 
1 43 
2 


29 


Average 


108 





Norms for Purdue Industrial Mathematics Test and Adaptability Test 


Table 2 


Mean Grade and Standard Deviation by Trade 


Trade 
Aviation Mech. Gen 
Metalsmith 
Machinist 
Electrician 
Aircraft Eng. Mech 


Total 


population reported. This is as one would 
predict in terms of the higher educational level 
of the ASTP students as compared with the 
Aviation Tradesmen and the latter’s greater 
“on the job” experience as compared with 
Vocational Trade School students. On the 
Adaptability Test the mean scores for the 
Aviation trades are comparable to those pub- 
lished by the authors for Naval Electrical 
Trainees, Employees of a Piston Ring Manu- 


Industrial Math Test 


Adaptability Test 


Mean S.D Mean S.D. 


16.8 
15.9 
16.5 
15.3 
14.2 
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16.1 5.6 


facturing Company and Total Group. Avia- 
tion Tradesmen surpass Female Applicants 
and fail to do as well as Employed Clerical 
Workers and Purdue Seniors. 

Table 3 presents the norms for the Adapta- 
bility Test. Data to the left of the vertical 
line are reproduced from Examiners Manual 
for the Adaptability Test by Joseph Tiffin and 
C. H. Lawshe and published by Science Re- 
search Associates, 228 South Wabash Avenue, 


Table 3 


Norms on the Adaptability Test 
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Table 4 


Norms on Purdue Industrial Mathematics Test (Form A or B) 


Vocationa 
Trade Sch 
Students 


ASTP 


Trainees 


Percentile 
Score 


100 
99 
O& 


95 


32 
31 
30 
28 
27 
25 
23 
21 
20 
18 
17 
16 
i4 
13 
11 
10 


80 


10 


SA@nvevnenwv 
on >» Omen Un VW tw 


| 


Nene 


w 
tN 


N = 125 


Illinois. ‘To the right of the vertical 
comparable norms for Metalsmith, 
Mechanic the total 
Tradesmen tested at Pensacola. 


Chicago, 
line are 
Aviation 
Aviation 


General and 

Table 4 presents the norms for the Purdue 
Industrial Mathematics Test (Form A or B). 
Data to the left of the vertical line are repro- 
duced from Preliminary Manual—The Purdue 
Industrial Mathematics Test by C. H. Lawshe, 
Jr. and Dennis H. Price—and distributed by 
the Division of Applied Psychology, Purdue 
University, Lafayette, Indiana. To the right 
of the vertical line are the additional norms 
developed at Pensacola. 


28 
27 
26 
23 
20 
18 
16 
14 
13 
12 


Total 
Aviation 
Tradesmen 


Aviation 
Mechanic 


General 


Metal 
smith 
33 
30 
28 


26 


35 
31 
29 


33 
wD 
28 
26 
24 
21 
19 
18 
16 
15 
13 
12 

9 


7 
4 
2 


24 
22 
20 
18 
17 
16 
14 
12 


20 
19 
17 
16 
14 
13 


The correlation between the Adaptability 
Test and the Industrial Mathematics Test for 
the total population (N = 152) proved to be .71. 
This seems very high for the two tests, where 
one is supposedly measuring arithmetic ability 
while the other attempts to measure general 
aptitude. An inspection of the items in the 
two tests, however, shows considerable overlap 
which would account for the high relationship 
found. 

The reliability of the two tests calculated by 
the Kuder-Richardson short formula was found 
to be .74 for the Adaptability Test, and .76 for 
the Industrial Mathematics Test. 
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Measurement of a Complex Psychomotor Performance 
by Means of a Printed Test 


Lloyd S. Nesberg and Kari U. Smith 


Universily of Wisconsin 


A practical aspect of measurement of human 
capacities and traits is the simplification of 
present methods in appraisal of psychomotor 
performance. One possible approach to sim- 
plified design of the psychomotor test is the 
development of printed examinations which 
will scale sensory-motor capacity in the same 
general manner as the apparatus test. In the 
present study, developmental research on a 
printed test has been carried out in order to 
measure the performance involved in a judg- 
mental reaction time test, the Vector Complex 
Reactometer. This test has been developed 
for determination of some aspects of pilot 
aptitude. 


Methods 


1. The Vector Complex Reactometer. In this 
test (Figure 1) the subject’s reaction time is 
measured in turning a series of switches relative 
to the changing position and direction of three 
lights, the pattern of which is altered during a 
predefined sequence. The subject, in turning 
a given switch on the response board, selects a 
correct group of switches in terms of the rela- 
tive direction of a red light with respect to a 
green light and, thereafter, a particular correct 
switch within the group in terms of the position 
of a white light. The three lights are presented 
on the stimulus panel atttomatically, and the 
subject’s response immediately prepares the 
apparatus for the presentation of the next 
stimulus pattern. 

The test under consideration here is designed 
to present forty different light patterns in a 
sequence, which may be repeated after the 
sequence is completed. It may be adminis- 
tered with a definite time limit, in which case 
the number of reactions in a given test period 
are scored, or the score may be defined as the 
time required to perform a given number of 
reactions. In the research conducted, the test 
was scored in terms of the total number of 


correct switches turned in a four-minute test 
period. 

2. The Motor Decision Test. In designing a 
printed form of this test (Figure 2) the same 
general principles of stimulus presentation and 
response involved in the Reactometer have 
been incorporated into each test item. Refer 
ence to the sample item in Figure 2 will show 
how the test was designed. Instead of the 
four groups of five white lights in the stimulus 
panel of the apparatus test, the printed item 
presents four groups of circles each arranged 
as in the performance test. The critical 
stimulus among these groups of white circles 
is indicated by the black filled circle, as shown 
in the upper right hand group in the sample 
item. In the printed item, triangles and 
squares are substituted for the red and green 
lights of the apparatus test, and a black triangle 
and a black square constitute the critical 
stimuli for guiding the response. 

In the printed test, the subject responds by 
checking one of twenty-five inverted 7’s. 
These inverted 7’s represent the switches in 
the performance test, and are grouped in banks 
of five as in the performance test. The correct 
bank of switches is indicated by the relative 
position of the black triangle with respect to 
the black circle. The particular correct switch 
within this bank is indicated by the position of 
the black circle. 

The Motor lest 
forty stimulus configurations of the apparatus 
test. Some of these forty items are repeated 
in the test to give a total of 108 items. The 
completed form of the test consists of 12 pages 
The 
score on this is the number of correct reactions 
minus the number of items marked incorrectly, 


Decision reproduces all 


of test items, with 9 items on each page. 


Experimental Results 


The treatment of experimental results will 
be discussed in the following order: (1) the 
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Fic. 1. The Vector Complex Reactometer. This device, designed by Dr. Jack Buel, School of Aviation 
Medicine, San Antonio, Texas, is manufactured by the Vector Manufacturing Co., Houston, Texas. The instru- 
ment was obtained for research through the cooperation of Dr. Buel 





O° standardization of the Motor Decision Test; 
(2) correlations and interrelations of the per- 
formance and printed tests; (3) the amount 
end significance of transfer; and (4) sex differ- 
ences in performance. 

1. The Standardization of the Motor Decision 
Test. Two preliminary forms of the Motor 
Decision Test were designed and investigated 
before a final copy of the test was drawn up 
On the basis of data obtained with these two 
forms of the test as well as with the final four- 
minute form, the following facts were estab- 
lished. The four-minute test interval estab- 
lished for this test length allowed none of the 
subjects to solve all of the stimulus patterns 





Memory and fatigue effects were negligible. 


Fic. 2. Design of a sample item in th Analysis of the location of errors does not reveal 


Motor Decision Test any differences in item difficulty. Errors of 





Measurement of a Psychomotor Performance 


Table 1 


A Quantitative Summary of Performance on the Printed and Apparatus Tests 


Sex 


5 Males 


Group I Performance Test 
(50 subjects) 


Group IT Performance Test Males 
(SO subjects) 


Group I Paper-Pencil Test Males 


(50 subjects) 25 Females 


Group II Paper-Pencil Test 25 Males 


(50 subjects 25 Females 





response to specific test items are not per- 
sistent. Total errors are extremely small, 
hence the number of items marked represents 
the major factor in the scoring. - 

Distributions of scores on both performance 
and printed tests are approximately normal. 
A detailed quantitative description of data is 
presented in Table 1. This table gives the 
means, range and standard deviations in two 
groups of subjects for both sexes and all test 
sequences. Group I was given the perfor- 
mance test first and, 48 hours later, the 
printed test. Group II was given the printed 
test first. The subjects were college students. 
There are no great differences in the range of 
scores on the printed and performance tests. 
The means are consistently lower on the paper 
and pencil form regardless of its position in the 
testing order. Males as a whole are more 
variable in test performance than females. 
There is no consistent trend in variability with 
regard to test sequence. 

Reliability. The test-retest reliability of the 
Vector Complex Reactometer, computed from 
the scores of a separate group of 23 subjects, 
is +.86. This is the reliability found when a 
one-day period separated the two administra- 
tions. The test-retest reliability of the Motor 
Decision Test, based on the performance of 53 
subjects and a temporal separation of 7 days, 
is +.83. Data presented below will show that 
correlation between the apparatus and printed 
tests approaches the reliability of each, as 
just described. 

2. Correlation between the Performance and 
Printed Tests. Correlations between the two 


Females 


5 Females 


Range Mean 
27-99 
37-99 


75.3. 
74.3 


58-135 8 
58-105 3 


8 
-83 


37-100 


tests were calculated for males and females 
separately and for the two conditions of test 
sequence used. ‘These data are given in Table 
2. All values given are significant at the 1 per 
cent level of confidence. When the correlation 
values given are transformed into Fisher’s Z,' 
and the standard error of the Z’s computed, the 
differences are not statistically significant. It 
is therefore reasonable to suggest that for the 
conditions given, the Motor Decision Test 
displays a high level of duplication of measure- 
ment of the factors involved in performance 
on the Reactometer. 

3. Transference of Response between the Two 
Tests. The design of the experiment permitted 
determination of the degree to which prior per- 
formance on the printed test affected scores 
on the performance test and vice versa. This 


Table 2 


Correlations between the Performance and 
Printed Tests * 


Groups r 


Composite for females +0.70 
+0.69 
+0.82 
+0.79 
+-0.84 
+-0.64 
+0.79 
+-0.77 


Composite for males 

Females: performance test first 
Males: performance test first 
Females: performance test second 
Males: performance test second 
Composite: performance test first 


Composite: performance test second 


* All values significant at the 1% level of confidence 


1 + 
'Z = 1.5313 log — "(8 


r 
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transference may be attributed to general situa- 
tional learning occurring in one test situation 
which is carried over to the second test, or to 
specific skill acquired in one test situation which 
is applicable to the second test. The amount 
of transference of response or generalization is 
indicated by the change in performance on one 
test attributable specifically to the fact of prior 
performance on the other test. When all 
subjects are considered together, a statistically 
significant increment in performance on the 
printed test is found when this test is taken 
after the performance test. The level of con- 
fidence of the difference in scores on the per- 
formance test when taken after the printed 
test and when taken without such previous 
testing experience also exceeds the one per cent 
level. Analysis of variance of the transfer 
data confirms these general statements and 
indicates, furthermore, that there is no signifi- 
cant effect of test sequence on transfer. This 
analysis discloses, however, that there are 
certain inherent differences in the measured 
characteristics of the performance and printed 
tests. 

4. Sex Differences in Performance on the Two 
Tests. Values of ¢ for differences in perfor- 
mance for males and females were computed 
for all data on each test and also for data 
separately obtained on each test with respect 
to test sequence. None of the values was 
significant, which confirms the hypothesis that 
the effect of the sex characteristic on these two 
tests is zero. 

Summary 


A printed test, designated the Motor De 
cision Test, has been designed to duplicate a 
complex reaction time apparatus test which 
has been manufactured for study of aircraft- 
pilot aptitude. The apparatus test has been 
the of Vector Re- 
actometer. 

Utilizing an item design made up of differ 
ently shaded forms which simulate light posi 


given name Complex 
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tions and switch arrangements in the apparatus 


test, it has been possible to duplicate in the 
printed test all combinations of stimuli pro 
vided in the apparatus test. Comparisons of 
ue two tests are made in terms of range of 
test scores, relative reliability, interrelation of 
test scores, and degree of generalization of per 
formance from one test to the other. 

Results show that the two tests have approxi 
mately equal test-retest consistency. The 
reliability of the apparatus test is +.86, that 
for the printed test +.83. These values are 
only slightly higher than those found for the 
intercorrelation between the two tests. For 
different groups of subjects, the intercorrela 
tion values typically vary from +.70 to +.80 

Means of scores on the printed test, ex 
pressed as total number of items answered 
correctly, are usually somewhat lower than 
those on the apparatus test. In the latter 
case, scores are expressed as the total number 
of effective responses made in a standard time. 
The distribution and variability of scores for 
the two tests are not significantly different. 

An analysis has been made of the degree to 
which performance on one test may affect 
scores made on the other. A significant posi 
tive transfer effect is found for both tests in 
terms of the effect of the prior administration 
of one on the test scores found in the later 
performance on the other. Some significant 
minor differences in the measured character- 
istics of each test are found through such over 
all analysis of the tests when administered 
successively. 

In general, it has been concluded that the 
printed Motor Decision Test duplicates ex 
tensively results obtained with the complicated 
apparatus test, and that such a printed test 
could be used safely as a substitute for or in 
conjunction with the apparatus test in screen 
ing procedures in which complex reaction time 
may be of practical significance 
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Visual Skill and Performance in a Meat Packing Plant * 


F. Nowell Jones 


University of California at Los Angeles 


and 
Charlotte Jean Smith 


United States Spring and Bumper Company 


To the best of the writers’ knowledge, no 
work on the testing of packing house employees 
has ever been reported. It would seem that 
there would be considerable possibility for the 
use of selection devices in this industry, since 
many of the jobs require a high degree of skill, 
and involve some hazard of injury, especially 
cuts. 


Preliminary Study 


The work on visual skill grew out of a pre- 
liminary study of a casings inspection depart- 
ment. Here the job required size grading and 
inspection by visual control, and results were 
negative when production was compared with 
the various Ortho-Rater measures, despite the 
fact that the criterion of production, output for 
two successive weeks, had a reliability of .96." 
However, N was only 17, the total number of 
workers in the department, and so it seemed 
desirable to extend the study to other jobs 
in the plant. 


The Present Study 


The Jobs. The three jobs selected for study 
were wiener skinning, bacon slice and wrap, 
and shipping cooler work. Wiener skinning is 
the procedure of removing the cellophane 


* This article is based on part of the material sub 
mitted by the junior author in partial fulfillment of the 
requirements for the M.A. degree at the University of 
Wisconsin. We wish to thank the Bausch and Lomb 
Optical Company for making available the Ortho 
Rater used in this study, and to thank Mr. Harold 
Jaeke, superintendent, and Dr. John M. McGinnes, 
psychologist, of Oscar Mayer and Company, Madison, 
Wisconsin, for respectively permitting us to use the 
facilities of the plant, and assisting us in obtaining 
subjects and criterion scores. 

! The Wonderlic Personnel Test and the Minnesota 
Rate of Manipulation Test were also administered to 
this group. The correlations with the criterion were 
Wonderlic, 31; Minnesota, placing, —.30:' Minne- 
sota, turning, —.39. These tests are apparently worthy 
of further study 
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frankfurters or 
This involves the use 
of a moderately sharp knife to cut the links 
apart, and to start the tearing of the cello- 
phane. 


from ‘“‘skinless”’ 
wieners after smoking. 


“casings” 


The skin is then removed by a spiral 
tearing action. Study of this job had been 
completed by the time study department, and 
there was considerable interest in improving 
production. The cellophane was quite difficult 
to see, being transparent, and it is possible that 
the job is performed best by “‘feel.’’ 

In the bacon slice and wrap operation we 
were concerned with the persons who lifted 
sliced bacon from a conveyer belt, weighed it 
into pound lots, and wrapped each lot in paper. 
Each operation involved visual control. 

The shipping cooler jobs were of the laboring 
type, in that the main work was moving stock 
to fill orders. It would not be expected that 
close visual control would be necessary here. 

Criteria. In the wiener skinning depart- 
ment it was possible to obtain two independent 
ratings of each employee, one by the foreman 
and one by the time study man who had been 
regularly assigned to this department. They 
rated each worker on a percentage scale, with 
100 indicated as the norm. Each rater made 
two ratings, a month apart. Unfortunately, 
the second rating by the foreman disappeared 
in the mails, and was therefore irretrievably 
lost. The two ratings by the time study man 
correlated .94, and the average his two 
ratings correlated .81 with the foreman’s rating. 

In the bacon slice and wrap department the 
criterion consisted of the foreman’s rating on a 
5-point scale. Asa matter of fact, the 1 posi- 
tion was not used, on the ground that all such 
workers had been transferred out. A second 
rating was made by the same foreman some 
weeks later, and the two ratings correlated .84. 
Considering the rather short scale used, this 
indicates a reasonably high criterion reliability. 


of 
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Table 1 


Correlations Between Production and Criterion 


Test 


Far Vertical Phoria 32 
Far Lateral Phoria 31 
Near Vertical Phoria 02 
Near Lateral Phoria — 03 
Far Acuity, Both Eyes 37 
Far Acuity, Worse Eye 19 
Near Acuity, Both Eyes 05 
Near Acuity, Worse Eye —- 18 
Depth Perception 20 
Color Perception ~ 10 


Correlations* 


Shipping Bacon Slicers 
W lener , Cooler and 
Ortho-Rater Skinners Employees Wrappers 


—.13 09 
12 29 
06 .08 
06 ~ 03 
09 09 
24 00 
19 07 
23 13 
03 03 
02 04 


* None of these correlations is significant; in no case does r/o, approach 2.58, and ¢’s show insignificant r’s for 


wiener skinners 


The criterion in the shipping cooler was not 
at all satisfactory. The foreman’s rating was 
used as in the case of bacon slice and wrap, but 
only the 3 middle positions on the scale were 
used, and the correlation between the first 
rating and a rerating was only .45. 

Subjects. In bacon slice and wrap and 
wiener skinning the employees were almost 
entirely female. A further restriction in 
bacon slice and wrap was the hiring of only 
young workers. In the shipping cooler, the 
employees were predominantly male. The 
populations were as follows: bacon slice and 


wrap, 47; wiener skinning, 26; shipping 
cooler, 66. 

Testing. ‘Testing with the Ortho-Rater was 
carried out in each department. Taking the 
test was voluntary, but as a matter of fact there 
were no “holdouts.” If glasses were custom- 
arily worn on the job, they were worn for the 
test. Testing was on company time. 


Results 


Table 1 shows the correlation between the 
criterion in each department and the various 
scales on the Ortho-Rater. None is signifi- 


Table 2 


\ Comparison of the Means and Standard Deviations for the Packing House Group and the OSRD Group 


Packing House OSRD 


N 
Mean 


Far Vertical Phoria 5.43 
Far Lateral Phoria 7.59 
Far Acuity, Both Eyes 9.45 
Far Acuity, Worse Eye 8.15 
Depth Perception 2.74 
Color Perception 4.30 
Near Acuity, Both Eyes 9.50 
Near Acuity, Worse Eye 8.50 
Near Vertical Phoria 4.14 
Near Lateral Phoria 8.57 


= 139 
o 


1.31 5.44 
7.19 
49 10.96 
9.58 
4.39 
4.81 
11.91 
10.50 
4.30 


7.97 


ww 
— 
oa 


we we — Rh th 
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cant at the 5% level, when tested by the 
standard error. The correlations for wiener 
skinning were also subjected to a ¢ test, and 
were found not to be reliable. Since a correla- 
tion between a phoria test and a criterion would 
be hard to interpret if the phoria scores were 
taken straight through from high to low, the 
r’s reported here are based on scores as devia- 
tions from the norm. Correlations calculated 
straight through show no significant relation- 
ship either. In addition, each scatter diagram 
was carefully inspected, and in some cases, 
where it appeared useful, average performance 
for each score on a given Ortho-Rater scale was 
plotted, to determine whether or not the cor- 
relation had failed to reveal a cut-off point, 
or other relationship. In no case was this true. 

It is apparent that, within the limits of 
criterion validity and reliability, application of 
the Ortho-Rater would, on the face of the 
matter, add nothing to the selection of workers 
for the jobs under consideration. It is possible 
that some restriction of range of Ortho-Rater 
scores might have lowered the relationships 
found and so we have compared our means 
and standard deviations with those reported 


in the OSRD study of reliability? The means 
are quite comparable, and our standard devia- 
tions are, except for depth perception, larger. 
The means and standard deviations are given 
in Table 2. It is difficult to escape the con- 
clusion that the Ortho-Rater would not be 
useful in this company, at least as far as the 
jobs under consideration are concerned. 


Summary 


1. The Ortho-Rater was used to measure 
visual efficiency of employees in three packing 
house departments: wiener skinning, bacon 
slice and wrap, and the shipping cooler. In 
all, 139 employees were tested. 

2. No significant correlation between Ortho- 
Rater scores and efficiency, as determined by 
foremen’s ratings (and in the wiener skinning 
department, by foreman’s and time study 
man’s ratings), was found 
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The Myth of Chronological Age * 
Austin S. Edwards 


University of Georgia 


In recent years there has been increasing 
study of the problems of age and aging (3, 4, 
5,7). Increasing numbers of the aged appear 
in our population and old age counseling has 
become a definite kind of psychological work. 
Many articles have appeared, but except for 
occasional reference, no accurate information 
is given concerning sensescents who are reason- 
ably healthy as compared with those who are 
seriously deteriorated or disabled (2). Gen- 
erally no distinction is made. See, for ex- 
ample, the excellent article on chronological 
age and quality of literary output (5). 

This paper is concerned with the question 
of the differences between senescence and 
senility. It suggests that here, as elsewhere, 
chronological age is an inaccurate indication 
of ability and competence. To what extent do 
the senescents differ from the seniles? Are 
many senescents still quite as capable as so- 
called younger individuals? In what respects 
are healthy and relatively uninjured senescents 
as capable or more so than younger people? 
All of these questions have had insufficient 
answers 

It is the purpose of this paper to consider 
one very important aspect of ability, namely, 
body, hand, and arm steadiness and to give 
some accurate information as to the difference 
in these respects between senescents who are 
not known to have any serious physical handi- 
caps, disease history, etc., and seniles, so 
diagnosed at a state mental hospital, and to 
compare both with the average for younger 
people. 


Body Sway. A limited number of Ss were 


measured for body sway on two different 


occasions with no special reference to disease 
history. In both cases the senescent individ- 
uals were practically as steady as the younger 
Ss, whose ages ranged from 10 to 29. In the 
first study, which gave age differences, the Ss 
aged 50-69, twenty-four in number, showed 


* Thanks are due Miss Anne Gilbert for assistance in 
part of this study 
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no greater body sway than did the younger Ss 
the mean and the median giving contradictory 
results. In the later study with 23 Ss, aged 
50-70, the difference, if any, was in favor of 
the older Ss. This was true with eyes open; 
the results with eyes closed showed little dif- 
ference, although some of the older Ss showed 
somewhat more sway than the younger. 

With insufficient numbers of cases, it does 
not appear that Ss from age 50 to 70, who have 
not suffered from disease or accident to any 
great extent, have necessarily more body sway 
than do younger Ss. In fact, taking individual 
cases, many of the older Ss have considerably 
less body sway than do many of the younger 

Hand and Arm Steadiness. Finger Move 
ments. A study has been made in connection 
with finger movements. The writer’s finger 
tromometer was used with standard procedure 
(7). The tromometer measures finger move 
ments in three dimensions of space. Time of 
It is believed that 
finger tremor (gross finger movements) is 


measurement is 30 seconds. 
a 
decidedly sensitive indicator. The mean finger 
tremor of 1000 Ss (aged 16-35) is 35.3 mm., 
S.D. 20.25. Examination of 65 senescents, 
aged 60-85 (average chronological age, 70.4), 
who had no serious amounts of disease history, 
and who may be considered as reasonably 
healthy individuals, had a mean finger tremor 
of 42 mm., S.D. 24.5. The difference between 
the means of these and of the 1000 younger 
subjects showed an increase of finger move- 
ments for the older group of only 19 per cent, 
with a critical ratio of 2.21 (significant at 5 
per cent level). Some of the oldest Ss were 
actually the steadiest. 

In contrast to this are the results of the 
measurements of the senile group at a state 
mental hospital. These 89 cases were diag- 
nosed in the case histories by the hospital staff 
as senile. Their ages were 54-90 (average 
chronological age, 67.7). Although the aver- 
| age of the seniles was almost 


ic 


age chronologica 
three years less than the average of the senes- 
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cents, the average finger tremor was 133.3 mm.., 
S.D. 66.8, which is more than three times as 
great as that of the former. The difference 
between the means of the seniles and the 1000 
younger Ss gave a critical ratio of 12.8, and that 
between the senescents and the seniles, a 
critical ratio of 12.0. 

Thus we find a small but statistically reliable 
difference between the senescents and the 
younger Ss, although the senescents cannot be 
said to have abnormal finger tremor or an 
amount that might be expected to have much, 
if any, practical significance. On the other 
hand, the seniles form a distinctly different 
group as compared with either of the others, 
and have an amount of finger and arm move 
ment (involuntary, uncontrolled movement) 
that would be expected to have serious con 
sequences so far as skilled work involving fine 
muscular control and steadiness is concerned. 

Men vs. Women. The differences between 
men and women are not remarkable except 
perhaps on one point. The senile men had 
increased finger tremor in comparison with 
women in almost the same relative amount as 
occurs with our younger 
groups. On the hand, the 
women showed relatively more increase than 
did the The average for 
younger men is 39.83, S.D. 21.17, and for the 


normal 


senescent 


so called 
other 
senescent men 
senescent men the average is 42.67, S.D. 26.2. 
S.D. 
18.02, and the average for the senescent women 


The norm for younger women is 30.33, 


was disproportionately higher, namely, 41.27, 
S.D. 26.4. 


men and women is truly indicative of other 


If this difference between senescent 


possible sex differences for the aged, it may be 


of no little scientific and practical significance 


Discussion 


The competence of so-called aged people, 
whatever that may mean, is to be understood 
as an individual matter. There are many 
individuals above 45 who are not competent 
On the other hand, 


there are skilled workers of much greater age 


to do good skilled work 


who are entirely competent and perhaps better 
both in quality and quantity of production 
than many younger workers. The personnel 


problem is not one of merely grouping all indi- 
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viduals above some certain age and calling 
them incompetent. Our study gives emphasis 
to the need for discovering what old people are 
entirely competent, and what aged people have 
been made incompetent not because of chrono 
logical age but because of sickness, injury, or 
deterioration. It is reasonable to expect that 
many older people need more frequent rest 
periods than do younger workers. It is even 
more important to recognize the fact that 
many older workers are of the highest com- 
petence and are often found to be superior in 
judgment, foresight, carefulness, freedom from 
accidents, and trustworthiness. For our study 
old age has meant 60-85, with practically as 
good hand steadiness as that of students aged 
16-35. What age mean as it is 
commonly used in business and industry? Is 
it much more than a superstition held by those 
who still continue in the fossilized thinking of 
by-gone ages? 


does old 


Summary 


1. So far as we have data on body sway, it 
appears that many people whose ages reach at 
least 70 are no less steady standing in the erect 
position than are many younger people. A 
considerable number of 


senescents are 


siderably more steady than the younger Ss. 


con- 


2. In our experiments upon finger tremor it is 
clear that senescents do not differ greatly from 
the average of 1000 college students. It is 
clearly evident that the average finger tremor 
for senile patients is more than three times as 
great as that of presumably healthy senescents 
whose average age was a little greater than that 
of the seniles (70.4 as compared with 67.7). 

3. Senescent women were found to have dis 
proportionately greater increase of finger 
Although this 
was not great, it suggests a problem worthy of 


tremor than senescent men. 
further research to discover whether: greater 
deterioration and inability appear in aging 
women than in aging men 


4. It is suggested that competence for various 


kinds of work can only very inadequately be 


judged in terms of any such rough and in- 
accurate indication as that given by chrono- 
logical age; and that abilities of men and 
women should be decided by means of such 
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accurate methods as are here suggested in one 
area of human behavior. 
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Reading Ease of Commonly Used Tests * 


Ralph H. Johnson 


Veterans Administration, Minnea polis 


and 
Guy L. Bond 


University of Minnesota 


The authors of this paper are concerned with 
the use of vocational tests requiring reading 
skills in the expanding post war counseling 
programs in personnel selection, school guid- 
ance programs, and the vocational counseling 
of veterans. 

Numerous intelligence tests requiring read- 
ing, skills are being administered to general 
population groups, veteran and non-veteran, 
and similar tests continue to be administered 
to students at all grade levels in survey testing 
for individual guidance purposes. Recent de 
velopments in the area of vocationa! counseling 
have also been characterized by the increased 
use of the Kuder Occupational Preference 
Record at the junior high school level and by 
more frequent use of the Strong Vocational 
Interest Blank at the senior high school level, 
plus the extensive use of both tests in the 
vocational counseling of World War II vet 
erans. It appears that some clarification is 
needed concerning the reading levels of the 
general population and student groups and 
the extent to which vocational tests which 
require reading skills match these reading 
levels. 

This article approaches the problem by indi- 
cating the general types and prevalence of 
reading limitations among the academic and 
general population. Second, this article at 
tempts to arrive at a relative determination of 
the readability level of tests commonly used 
in counseling and group testing situations. 
Third, this article suggests observations and 
conclusions that may have implications of im- 
portance concerning the use of tests requiring 
reading abilities in 
situations. 


vocational counseling 


* The opinions expressed herein are the views of the 
authors and are not to be construed as representing the 
Veterans Administration 


Clients assigned to tests requiring the use of 
reading skills may be limited in their compre- 
hension of written materials because their in- 
tellectual capacity prevents them from com- 
prehending material written beyond the ninth 
and tenth grade levels; or, the clients may 
have reading disabilities which precluded their 
reading growth from developing at the same 
pace as their intellectual development. A 
client in the latter category is considered a 
reading case if there is a significant degree of 
difference between his mental age and his 
reading age. At the junior high school level, a 
student is considered to be a reading case if his 
reading age is two or more years below his 
mental age (20). Mental age in the above 
definition is considered to be the mental age 
derived from an individual test of mental 
ability such as the Stanford-Binet or Wechsler 
Bellevue 


Table 1 


Reading Level of Adults with IQ’s Below 100 


Reading 
IQ } Grades 


75 
80 


RS 


In the general population, the prevalence of 
reading limitation ascribed to lack of intel 
lectual capacity is reflected by Table 1. Table 
1 is derived from Gates’ (13) table for trans- 
Bond (5) 


states that normal reading grade at maturity, 


lating age scores into grade scores. 


as indicated by Table 1, may be considerably 
increased by exposure to good teaching. 


419 
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Other indications of the reading level of the 
general population are the following: (a) Un- 
published results of Army and Navy surveys 
which indicate mean reading grade levels of 
between grades 8 and 10 for veterans of World 
War II; (b) Lorge and Blau (19) found an 
average reading grade level of 9.2 among 242 
tested adult WPA subjects; (c) Census data 
for 1940 give grade 8.4 as the average school 
grade completed by our population. ‘This is of 
significance when considered with Witty's (24) 
conclusion that the attainment in 
reading of elementary school graduates har 
monizes closely with grade expectancy. 

An indication of the prevalence of reading 
disability cases is reported in a study by Gray 
(14), who states that one out of five junior high 


average 


school students in Chicago had a reading dis- 
ability or was considered a reading case. An- 
other study by Monroe (20) indicates that four 
out of five reading cases are boys. 

Assuming that these studies generally repre- 
sent an approximate indication of the reading 
limitations of the general population, it is 
encumbent upon counselors to exercise care in 
the selection and interpretation of tests which 
require varying reading skills. 
Since test results may vary depending upon the 
reading level of the client, estimating and, if 
possible, knowing the approximate reading 
level of the client and the readability level of 
the test seems necessary to insure reasonably 
accurate test selection and interpretation. 

Clues which may give a general indication 
of the client’s reading level are the following: 
(a) Results of a test of mental ability, if avail 
able; (b) Vocabulary level of the client as 
revealed in the interview: (c) Client’s facility 
at interpreting questionnaire material; (d) 
Content of client’s correspondence; (e) Client’s 
educational level (use with caution, since an 
individual’s reading level may vary as much 
as six grades from his educational level). 

Observation by the psychometrist or coun 
selor of the client’s behavior in the test room 
situation may prove very meaningful in de- 
tecting clues as to the reading level and type 
of reading disability characterizing the client 
These clues may consist of the following: (a) 
Excessive articulation and lip reading; (b) Use 
of crutches in reading (pointing with finger and 
pencil); (c) Frequent regression in reading 


degrees of 
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passage; (d) Excessive fixations per line; (e 
Frequent consultation with psychometrist re- 
garding comprehension of test directions and 
items; (f) Speed of reading; (g) Discrepancies 
between verbal and non-verbal tests and dis- 
crepancies between subtests; (h 
of tired eyes. 

A quick check of the difficulty of a passage 
for an individual is the number of words he 


Complaints 


misunderstands and misses orally out of 20. 
Betts (4) states that if a student in the ele 
mentary or secondary grade level misunder 
stands over one word in 20, he may lose the 
meaning of the passage. This type of check 
may prove efficient in a testing situation. If 
there is sufficient time, group or individual 
reading survey tests would give the desired 
information. 

Assuming the counselor’s selection of indi- 
vidual tests in a test battery is made to con- 
form with the client’s reading level, one must 
make the further assumption that the coun- 
selors have information regarding the reading 
level of various vocational tests. 

The authors have felt that more information 
was needed regarding the reading level of 
various tests which are being given rather 
routinely to academic and general population 
groups, and have attempted to measure their 
reading level by applying the Flesch formula to 
the more commonly used tests in vocational 
counseling (2, 3). In most 
formula was applied to the directions sepa 
rately. 

Formulas for measuring the readability level 
(comprehension) of grade school text books 
have been in use for the past 25 years and 
Klare (17) states that to date there are about 
34 formulas or methods available. 

The Flesch formula was used in this limited 
survey for the following reasons: It is an effi- 
cient formula; its author (9, 11) claims it has 
been used with success in sampling the reading 
level of adult reading materials; it has been 
applied experimentally to the readability level 
of Public Opinion Questionnaires (16), and 
Klare (17) states that it correlates significantly 
with other formulas. Flesch’s revised formula 
11) is based on the number of syllables per 
100 words and the average sentence length in 
words. Scores resulting from this formula 
vary from 100 to zero. <A score of 100 corre 


instances, the 
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sponds to the prediction that a child who has 
completed the fourth grade will be able to 
answer 34 of test questions asked about a 
passage that is being rated. A score of zero 
indicates that the passage is practically 
unreadable. 

For further information regarding develop- 
ment, standardization, reliability, and validity 
of this formula, the reader is referred to publica- 
tions by Flesch (9, 10, 11, 12). 

The application of the formula to some 
commonly used vocational tests gave the re- 
sults indicated by Tables 2 and 3. Table 2 
represents tests for which an overall selected 
sample was used in determining readability 
level. Due to the spiral nature of the tests 
listed in Table 3, three separate measures of 
reading ease were made. 

In determining the Reading Ease of multiple 
choice test items, only the correct response was 
counted. Had all five of the possible re- 
sponses been included, sentence length and 
word length would have been increased, re- 
sulting in a lower (more difficult) Reading 
Ease score. Therefore, the results listed in 
Tables 2 and 3 tend to represent a conservative 


Table 2 
Over-all Readability Levels of Selected Tests 


as Determined by Application of 
the Flesch Formula 


Grade 
Level 


Reading 


Test Ease 


Bennett Mechanical Comprehension 90 
Minnesota Multiphasic Personality 

Inventory 88 6.0 
Directions for Minnesota Clerical! 87 6.0 
Bell Adjustment Inventory 80 7.0 
Directions for Bell ol 9.5" 
9.0" 


9 5* 


California Interest Inventory 65 
Kuder Occupational! Preference Record 60 
Directions for Kuder 70 8.0 
College G.E.D. No. 2 59 10.0° 
Ohio State Psychological Test (Part 3 37 
Strong Vocational Interest Blank 35 
Directions for Strong 73 
Allport-Vernon 35 
Directions for Allport-Vernon oO 


* Starred grade scores represent Flesch’s corrected 
grade placement for the area of extrapolation beyond 
the 7th grade 
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estimate of the reading difficulty of the test 
items sampled. 

The authors recognize that the application 
of the Flesch formula to test items in voca 
tional counseling test batteries was not the 
purpose for which the formula was designed. 
Therefore, the following definite limitations of 
the formula must be considered in any inter- 
pretations resulting from its application: (a) 
Reading grade placement scores above grade 7 
represent estimated corrections for area of 
extrapolation beyond grade 7; (b) The for 
mula does not appear to measure the effect of 
complexity caused by double negatives in a 
test such as the Minnesota Multiphasic Per 
sonality Inventory; (c) The formula does not 
appear to measure the rather involved direc- 
tions in some parts of the Strong Interest Test; 
(d) The above results have not been verified 
by any extended sampling and, with the excep- 
tion of. the interest test items, sampling is 
limited to test items that are complete sen- 
tences; (e) The formula is not designed to 
measure the readability level of test items 
that are not complete sentences. Therefore, 
the readability level of the Strong, Kuder, and 
California Interest Tests, as indicated in Table 
2, can be considered valid only in the sense that 
they indicate relative difficulty of the several 
tests. 

It appears that a formula which measures 
word complexity and abstractness may be 
more appropriate in measuring the readability 
level of interest test items which contain single 
The authors did not at- 
tempt to apply a formula of this nature to the 
interest 


words and phrases. 
tests. However, the Lewerenz for- 
mula, which is based on vocabulary alone, was 
used by Stefflre (21) in computing the read- 
ing difficulty of interest inventories and, with 
the exception of the results on the Study of 
Values, there appears to be relative general 
agreement on the readability level of interest 
tests as measured by the Flesch and Lewerenz 
formulas. It is of interest to note that Auker- 


man (1) questions the importance of specific 
vocabulary ability in reading situations, since 
his study indicated that good students were 
significantly 


superior to poor students in 
general reading ability, and the fact that good 
students and poor students were not signifi 
cantly different in either general or specific 
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Table 3 


Grade and Reading Ease Level of Some Intelligence Tests as Determined by the 
Application of the Flesch Formula 


Directions 
R.E. GL 


A.G.C.T 93 55 
a. Block counting 

b. Arithmetic items 

c. Vocabulary items 


. Henmon-Nelson, Form A, 
Grades 7~12 
a. Non-arithmetic items 
b. Arithmetic items 


Kuhiman-Anderson 
(Grade 6) 


a, Sample I 
b. Sample IT 


. Otis Higher Exam, Form C 


Terman-McNemar,{Form D 
a. Information test 

b. Logical selection 

c. Analogies 

d. Best answers 


First Items Middle Items Last Items 


R.E G.L R.E G.L R.E 


91 5.5 
78 
86 


90 5.5 58 10.0* 
64 9.0* 10 17.0* 
87 6.0 80 7.0 
62 9.0" 63 9.0° 


* Starred grade scores represent Flesch’s estimated corrected grade placement for the area of extrapolation 


beyond the 7th grade 
conversion tables. 


vocabulary ability indicates that knowledge of 
words is less important than reading ability 
as a whole. 

A limited check on the accuracy of the 
Flesch formula was made by applying the 
Lorge (18) formula, to the Minnesota Multi 
phasic Personality Inventory and the Dale- 
Chall (8) formula to a sample of the informa- 
tion section of the Terman-McNemar Intel 
ligence Test. The application of the Lorge 
formula to the same sampling of the MMPI 
resulted in a grade score of 6.1, which is in 
agreement with results on the Flesch formula. 
The Lorge formula is based on average sen- 
tence length, number of prepositional phrases 
per 100 words, and number of difficult words 
not appearing in Dale’s list of 769 easy words. 
The application of the Dale-Chall formula to 
the Terman-McNemar Information Test gave 
a corrected grade level score of 12, which is 
somewhat higher than the grade score arrived 


Reading grade placement scores indicated by Tables 2 and 3 derived from Flesch’s (9, 11) 


at for the same sample (last items) by the 
application of the Flesch formula. The Dale- 
Chall formula is based on sentence length and 
number of difficult words not appearing on 
Dale’s list of 3000 familiar words. 

Scrutiny of Tables 2 and 3 suggests tentative 
observations which may have implications of 
value to counselors if the results in the tables 
can be accepted as approximating the reading 
levels indicated. In 
directions gave a reading grade score below the 
readability level of test items. Interest tests, 
particularly the Strong Interest Test and the 
Study of Values, appear to be at a reading 
level considerably higher than that attained 
by the mean of the high school or general 
population. The relatively high difficulty 
level of the Strong Interest Test, as reflected 
by the Flesch and Lewerenz formulas, seem to 
be in accord with the educational 
Strong’s Occupational Criterion Groups, since 


some instances, test 


level of 





Reading Ease of Commonly Used Tests 


his groups are considerably above the mean 
of the general population. Strong (22) lists 
the average educational level of each occupa- 
tional criterion group, and they range from 
10.4 to 19.0. The mean educational attain- 
ment of 35 out of 39 of Strong’s Occupational 
Criterion Groups was 14.5, and the standard 
deviation of the group was 2.4. 

The Kuder Preference Inventory appears to 
have a reading level above that of the average 
junior high school student. Accordingly, it is 
of interest that a study by Christensen (7) 
indicated that 9th grade pupils of mean average 
intelligence had erroneous ideas concerning the 
meanings attached to 21 key words in the 
Kuder Preference Record. After instruction 
concerning the meaning of these terms, the 
results indicated that such instruction prob- 
ably played a role in causing subjects to change 
their preferences. 

The Bell and the MMPI personality tests 
have reading comprehension levels which 
would appear to be understood by most adults 
and junior high school groups. 

Some of the commonly used intelligence 
tests measured appear to be at a reading level 
above that of the group for which the tests were 


designed and also appear to be spiral power 
tests which may to some degree be measuring 


reading skills. Center and Persons (6) found 
that after one semester of remedial reading 
instruction, some pupils gained from 12 to 17 
IQ points on the Terman Group Test of Mental 
Ability. A class of 40 pupils made an average 
gain of 4.3 1Q points under the above condi- 
tions. The Otis and the Kuhlmann-Anderson 
seem to more nearly match the reading levels of 
the groups for which they were designed. 

The block counting and arithmetic items of 
the AGCT appear to be at the reading level of 
adults mental age is 13 and 
These two subtests of the AGCT would be in 


whose above 
accord with the reading level of most junior 
and senior high school students. The reading 
level of the entire test would not be too difficult 
for most senior high school students and above 
average adults, since the most difficult section 
has a reading level of about grade 10. How 
ever, it must be remembered that over one half 
of the adult population is below this level and 
may be mismeasured 

Some non-arithmetic items of the Henmon 
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Nelson test and the logical selection of Terman- 
McNemar are at about the college reading 
level. That is, about grade 14. 


Summary 


Assuming that the above observations are 
generally correct, the findings of applying 
reading ease formulas to directions and items 
of commonly used tests may be summed up 
as follows: 


1. Counselors, psychometrists, teachers and 
personnel technicians may have to re-evaluate 
their past and present practices regarding test 
selection, administration and interpretation. 

2. Psychometrists and counselors cannot 
assume that if a client comprehends the direc- 
tions for a test he will necessarily comprehend 
the test items. 

3. There is need for interest inventories that 
junior high school students and the general 
population can easily comprehend. 

4. The reading level of the Bell and MMPI 
tests appears to be well adapted to most of the 
general population and most junior high school 
groups. However, the variability in reading 
ability in these groups would indicate that even 
these tests would not adequately measure the 
clients in the lower end of the reading distri- 
bution. 

5. The Army General Classification Test 
would appear to have possibilities for more 
extensive use at the junior and senior high 
school levels and for use with general popula 
tion groups. It also has possibilities for use 
as a substitute for individual tests of mental 
ability. This conclusion is fortified by a recent 
study by Tamminen (23) which shows a cor- 
relation of .83 between the Wechsler-Bellevue 
and the AGCT. 

6. There is need for group tests of mental 
ability that are not affected by an individual’s 
abilities in reading. It appears that some of 
our commonly used intelligence tests tend to 
favor those individuals who have, because of 
their environment, attained a high degree of 
reading skill. If we are to encourage and 
assist in the development of an individual’s 
natural potentialities, we need reliable means 
of measuring his real-innate intelligence, no 
matter how inadequate his reading skills may 
be. Our country’s brain power is likely its 
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greatest resource and certainly every effort 
should be made to discover and develop it. 


. Flesch, R. 
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How Readable are Occupational Information Booklets? 


Arthur H. Brayfield and Patricia Aepli Reed 


University of California, Berkeley, California 


Dullness and incomprehensibility have never 
to our knowledge been postulated as twin goals 
for writers of occupational information. In- 
deed, the injunction to make such information 
interesting and readily understandable has 
been sounded since the early days of the 
vocational guidance movement. 

“It (occupational information) needs to be 
put in terms that can be used by all kinds and 
conditions of people it should be made 
available in very simple and popular form” 
was the admonition of Charles R. Richards as 
early as 1915 (3, 514). In the mid-twenties 
Mary C. Schauffler wrote: “A study may be 
clear, definite, accurate, and brief and yet be 
dull reading, and on that account, may be of 
little value for classroom purposes. It is not 
enough that material be presented clearly and 
directly, it must also be presented in a manner 
interesting and understandable to the group 
that is to use it” (4, 136). More recently, ina 
comprehensive treatment of occupational infor- 
mation published in 1946, Shartle expressed 
the belief that “While information must be 
carefully checked for accuracy, it must also be 
arranged and written in a style that is readable 
and easy to use” (5, 76). 

What are the facts? Have writers and 
publishers of occupational information suc- 
ceeded in producing readable materials? The 
writers have discovered no studies bearing on 
the problem; this paper reports a preliminary 
investigation of the question “How readable 
is occupational information?” 


Procedure 


To answer this question we applied the re 
vised Flesch method of measuring readability 
and human interest to sample passages from 
current occupational information literature. 

The revised Flesch readability formulas were 
described at some length in 1948 (1). Formula 
A is essentially a test of level of abstraction and 
is thought to be an index of comprehension 
difficulty. Formula B predicts the effect of 


two “human interest” elements on comprehen- 
sion. Flesch considers its real value to lie in 
the fact that “human interest will also increase 
the reader’s attention and his motivation for 
continued reading” (1, 226). 

An attempt was made to sample widely the 
current occupational literature. Publications 
covering a variety of occupations and industries 
were included from many different publishing 
{ major difficulty encountered was 
the paucity of occupational information at the 
so-called lower job levels. 

Reading ease and human interest scores were 
determined for these materials as follows: (a) 
Five samples were chosen from each piece of 
writing; (b) Each sample was marked off to in- 
clude 100 words; and (c) The steps outlined by 
Flesch were taken to compute the readability 
and human interest scores. 


sources. 


Results 


First, we analyzed 31 publications describing 
professional level occupations. The results are 
given in Table 1. The readability standards 
suggested by Flesch (1) were referred to for 
interpretation of the findings. 

Flesch groups reading ease scores into seven 
categories ranging from ‘‘Very easy” to “Very 
difficult” with a further description according 
to representative magazine publications rang- 
ing from “Comics” to “‘Scientific.”” The most 
striking fact obtained from this analysis is that 
65 per cent of the publications studied fall into 
the category “Very difficult” as represented by 
scientific journals and the remainder are classi- 
fied as “Difficult” or equivalent to academic 
magazines. 

With respect to human interest, Flesch de- 
scribes five categories of scores ranging from 
“Dull” to “Dramatic.” Typical magazines 
range from ‘Scientific’ to “Fiction.” Of 
these 31 publications 71 per cent are judged by 
the Flesch formula for human interest to be 
“Dull” and appropriate to scientific journals. 
The remaining 23 per cent are rated in the 
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Table 1 


“Reading Ease” and “Human Interest” Scores for Publications, Arranged by Source, 
Describing Selected Professional Level Occupations 


Source Lawyer 


Engineer 


Occupation 


Teacher 


Pharmacist 


ee mee Re Se” EE” SE” 


U.S. Department of Labor 4 14 
U.S. Employment Service 1 43 
Science Research Associates 18 
Institute for Research k : 11 
Bellman Publishing Company . 49 
Occupational Index, Inc d 3 
Professional Organizations 


adjacent category “Mildly interesting” on a 
par with trade publications. 

It might be hypothesized that these findings 
are accounted for by the complexity and tech- 
nical nature of professional occupations. The 
findings of our study of a limited sampling of 
skilled and semi-skilled occupations lend only 
slight support to this contention. 

The data of Table 2 indicate that these 19 
publications are slightly more readable than 
those for professional occupations: only 53 per 
cent fall into the “Very difficult” or scientific 
classification for reading ease. However, none 
of the samples rated less than “Difficult” or 
academic, which is the next category on the 
Flesch scale. 

Results for the human interest analysis 
follow the same pattern: 68 per cent may be 


36 


categorized as “Dull” or scientific. The 
mainder, with one exception, are comparable 
to trade journals or ‘‘Mildly interesting.”” The 
exception barely rates as “Interesting.” 

Industrial as well as occupational classifica- 
tions were studied. The results for reading 
ease of these 16 publications dealing with 
entire industries are reported in Table 3. 
They are similar to the previous findings. Of 
these, 62 per cent rank as “Very difficult.””, An 
additional one fourth rated “Difficult,” one 
was only “Fairly difficult” similar to quality 
magazines, none were “Standard,” and in the 
remaining publication the hotel industry 
actually is described in a Science Research 
Associates pamphlet in a way which earns it 
a description as “Fairly easy,”’ like slick 
fiction. 


Table 2 


‘Reading Ease” 


and “Human Interest” Scores for Publications, Arranged by Source, 


Describing Selected Skilled and Semi-Skilled Level Occupations 


U.S. Department of Labor 
U.S. Employment Service 
Science Research Associates 
Institute for Research 
Bellman Publishing ( 
Occupational Index, In¢ 


Commonwealth Book Company 


Occupation 


Welder 


3) 


»» 


49 
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Table 3 


“Reading Ease” and “Human Interest” Scores for Publications, Arranged by Source, 
Describing Selected Industries 


Retail 


Source 


RE" Gi? RE" 


Science Research Associates 52 13 23 


Institute for Research 3 
Bellman Publishing Company 8 14 31 
Occupational Index, Inc 16 4 
Commonwealth Book Company 16 9 
Western Personnel Institute 


Human interest values are slightly more 
favorable for the potential consumer of these 
publications. Only 37 per cent rate at the 
extreme as “Dull.” However, all but one of 
the rest fall into the adjacent “Mildly inter- 
esting” classification. The single exception is 
judged “Interesting”’ like digest magazines. 

It was not a prime objective of this investi- 
gation to make comparisons among publishers 
with the one exception described in the next 
paragraph. It may be noted in passing though 
that there do not appear to be any significant 
trends. No differences of any magnitude in 
readability are suggested by these data al- 
though they might appear if the entire series 
of publications from each source were studied 
in a similar manner. Actually none of these 
publishers appears to be producing materials 
which begin to meet Flesch standards indica 
tive of easy and interesting reading. 

An analysis was made of the publications of 
business and industrial concerns themselves to 
determine whether or not the highly ballyhooed 
advertising practices of ‘‘big business” have in- 
fluenced the readability of the occupational 
information supplied by such firms. The re- 
sults of the study of the occupational informa- 
tion materials of twelve companies are re- 
ported in Table 4. These include such repre- 
sentative titles as “What Shell Means to You,” 
“Opportunities for Employment,” and ‘What 
about Your Future?” 

Actually, 83 per cent rate as “Very difficult.” 
The remaining two publications are split be- 
tween “Difficult” and “Fairly difficult.” 


Motion Picture 


Industry 


Real Estate 


Iron and Steel 


Se ns ee 3 Sab * Sa: “aa 


41 3 


3 


Two thirds are judged “Dull.”’ One fourth 
are ‘‘Mildly interesting” and only one publica- 
tion achieves a designation as “Interesting.” 


Summary 


In all, 78 pieces of occupational information 
literature from 24 different sources were ana- 
lyzed. Almost two thirds ranked as “Very 
difficult” or at the scientific level with respect 
to reading ease while another 32 per cent were 
ranked “Difficult.” 

Almost exactly the same proportions held for 
the categories “Dull” and “Mildly interesting” 
when human interest scores were studied. 


Table 4 
“Reading Ease” and “Human Interest’’ Scores for 


Occupational Information Publications of 
Private Business and Industry 


“Human 
Interest” 


Sees 4 Reading 
Fase 
Burroughs 20 
Chase National Bank 29 
Corning Glass Works 23 
Equitable Life Insurance 14 
General Motors 15 
J. C. Penney 51 
Proctor and Gamble 7 
Roos Bros. Department Store 36 
Shell Oil Company 23 
Union Oil Company 19 
United Air Lines 21 
United States Steel 22 
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Fewer than five per cent of these publications 
reach the readability level of the popular “digest” 
magazines. 

Within the limitations of the Flesch forraulas 
for measuring readability and the limitations 
of our sampling of occupations, industries, and 
publishers we conclude that current occupa- 
tional information falls far short of meeting 
the requirements for comprehension and in- 
terest which have been suggested through the 
years by persons intimately concerned with 
the preparation and use of such information. 
Dullness and incomprehensibility reign su 
preme. Writers and publishers of occupa- 
tional information might well consult Flesch 


(2) and others if they are seriously interested 
in “The Art of Readable Writing.”’ 
Received October 28, 1949 
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Personality Questionnaire * 
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This is the third in a series of papers dealing 
with the Objectivity score on a short person- 
ality questionnaire designed for industrial use. 
The questionnaire is restricted and is not avail- 
able for general use, but the principles involved 
are more important than is the particular form. 
In the first paper (3) the questionnaire was de- 
scribed and the use of the Objectivity key, 
patterned after the L scale of the MMPI (2), 
was described. It was shown there that 
Emotionality scores, in particular, seemed to 
be related to Objectivity scores when the 
questionnaire was administered to industrial 
personnel. 

The information presented in that paper and 
the use of that information as described is of 
no value unless it can be shown that the Ob- 
jectivity key is a valid measure of ‘‘faking’’ and 
unless the other keys are also valid for measur- 
ing various personality characteristics. A 
second paper described an experiment with 
college students in which it was shown that the 
Objectivity key, and other keys could be 
“faked,” if the respondents so desired (1). 

Faking the questionnaire to “look good” 
resulted in low Emotional and also low Objec- 
tivity scores that gave the interviewer a clue 
that “faking” might be present. Thus it was 
concluded that the Objectivity key is a valid 
measure of what it is intended to measure 
As described in the first paper, the Objectivity 
key is interpreted in industrial practice to 
“protect” highly objective persons rather than 
to locate “fakers.”’ 

The purpose of the present paper is to 
present evidence on the validity of the Emo- 

* Published with permission of the Chief Medical 
Director, Department of Medicine and Surgery, Vet- 
erans Administration, who assumes no responsibility 


for the opinions expressed or the conclusions drawn by 
the authors 


tionality key. This key is shown to be a valid 
measure of maladjustment. Accordingly the 
validity of the procedure of using the Objec- 
tivity scores to determine the set of norms to 
use in interpreting the Emotionality key on a 
short personality questionnaire (50 items) in 
industrial screening is demonstrated.' 


The Criterion Group 


For purposes of this study the personality 
questionnaire was administered routinely to a 
sample of 100 male patients consecutively ad- 
mitted to the Mental Hygiene Clinic of a 
Veterans Administration Center. All of the 
respondents had been admitted to the clinic as 
patients, and this fact is used as the criterion 
of their maladjustment. The specific nature 
of their maladjustments is unimportant here 
because the Emotionality key under discussion 
is a screening key and does not yield responses 
that are diagnostically refined. 

For comparative purposes a sample of 100 in- 
dustrial employees was selected from master 
data sheets of test scores. Every fifth man was 
selected, beginning with the most recent entry 
on the data sheets and proceeding until a 
sample of 100 cases was obtained. The two 
groups, clinical and industrial, are accordingly 
not matched in any respect except sex. 


Results 


Complete distributions of the scores of these 
two groups together with the means and 
standard deviations on each of four keys 
(Objectivity, Emotionality, Social Dominance, 

' The writers wish to thank Miss Judy Yackle for 
assisting in the preparation of this paper, and the Psy 
chological Trainees, particularly Arthur Schomp and 


Harold Gilberstad, at the VA Mental Hygiene Clinic, 
for helping to collect the data on patients. 
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Table 1 


Distributions of Objectivity Scores of 100 Clinical 
Patients and 100 Industrial Employees 


Objectivity 
Score 


Clinical 
Patients 


Industria! 

Employees 
s 

19 


»? 


pare 


6 


Mean 
S.D 


and Drive) are presented in Tables 1, 2, 3, 
and 4, 

The Objectivity Key. The two groups were 
surprisingly well matched when their Objec- 
tivity scores were compared. ‘The mean score 
for each group was 2.7 and the S.D. for each 
group was 1.5, as is shown in Table 1. The 
critical ratio of the differences between the 
means of these distributions is .05, which is a 
non-significant difference. 


The Emotionality Key. The two groups 


were quite disparate on the Emotionality key, 


as shown in Table 2. The mean score for the 


Table 2 
Distributions of Emotionality Scores of 100 Clinical 


Patients and 100 Industrial Employees 


Industrial 
Employees 


Clinical 
Patients 


Emotionality 
Score 


0 


‘ 
il 
22 
11 


clinical sample was 7.9, with an S.D. of 2.6 
The mean for the industrial sample was 4.6, 
with an S.D. of 2.3. The critical ratio be 
tween these two samples was 9.3, hence it is 
concluded that the difference is a real one 
Since the clinical sample was maladjusted 
according to an outside criterion, and since the 
industrial sample was not known to be ma! 
adjusted, it is concluded that the Emotionality 
key is a valid measure of maladjustment 

The correlation between the Emotionality 
and the Objectivity scores for this clinica] 
sample was .25, which is small enough to be of 
little or no practical significance. This is an 
important point since, as previously shown by 
a co-variation of distributions, these keys are 
rather highly related for industrial or college 
samples. For the industrial sample discussed 
in this paper, the correlation between these 
two keys was .52. 

Thus it is shown that there is not an in- 
evitable high degree of relationship between the 
Objectivity and the Emotionality scores ob- 
tained on this questionnaire. The fact that 
such a relationship does occur with industrial 
samples, and does occur when college students 
take the questionnaire and “fake’’ it, further 
strengthens the belief that low Objectivity 
scores are indicative of the existence of ‘‘fak- 
ing.”” Stating this in terms whereby this form 
is used in actual practice, the belief is strength- 
ened that persons who obtain high Objectivity 
scores and high Emotional scores in industrial 
interviewing situations should not necessarily 
be considered maladjusted. Rather, their 
scores should be interpreted on the basis of 
norms based upon persons of high Objectivity 
scores. Persons with low Objectivity scores 
and high Emotionality scores may be suspected 
of poor adjustment, requiring more intensive 
interviewing and testing procedures. The rela- 
tionship of Objectivity scores to Emotionality 
scores does not appear to be a spurious one in 
industrial practice. 

The Social Dominance and Drive Keys. On 
the Social Dominance scale the clinical patients 
obtained lower scores than did the industrial 
sample. ‘These data are presented in Table 3. 
The clinical mean was 4.9, with an S.D. of 2.2. 
The industrial mean was 6.2, with an S.D. of 
1.7. The critical the two 


ratio between 
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Table 3 


Distributions of Social Dominance Scores of 100 Clinical 
Patients and 100 Industrial Employees 


Social! 
Dominance 
Score 


Industrial 
Employees 


Clinica 
Patients 
0 0 
1 ‘ 


groups was 4.69, which indicates a significant 
difference. 

On the Drive scale also the clinical patients 
were lower than the industrial sample as shown 
in Table 4. The clinical mean was 4.9 and 
the S.D. was 1.5. The industrial mean 
5.8 and the S.D. was 1.6. The critical ratio 
between the two groups was 3.52, which is 
again a significant difference. 

Thus the clinical patients differed signifi 
cantly from the industrial sample on three keys: 
Emotionality, Social Dominance, and Drive; 


was 


Table 4 


Distributions of Drive Score of 100 Clinical Patients 
and 100 Industrial Employees 


Industrial 
Employees 
0 
l 
] 
& 


Mean 
S.D 


but did not differ on Objectivity. The pre 
vious paper by Carr and Rothe (1), showed that 
students, when “faking to look good,” obtained 
low Objectivity scores, low Emotionality scores, 
high Social Dominance scores, and unchanged 
Drive scores; when ‘faking to look bad” the 
students obtained high Objectivity scores, high 
Emotionality scores, low Social Dominance 
Thus the stu- 
dents, when “faking” could alter their scores 
on all three keys, but their extreme Objectivity 
scores revealed that they were “faking” and 
made possible an adjustment in the interpre- 
tation of their scores 


scores and low Drive scores. 


The clinical patients were apparently ‘‘nor- 
mal’ in their Objectivity, but low in Social 
Dominance and low in Drive. Accordingly, it 
is concluded that these patients as a group were 
actually characteristics. 
There is no reason to suspect that they were 


low in those two 


“faking” their scores on those two keys, par- 


ticularly since they ranked on the low, and for 
many positions the undesirable, ends of the 


St ales 


Conc lusions 


Data are presented in this paper showing 
that clinical patients who may be considered 
neurotic obtained significantly high scores on 
the Emotionality key of this short personality 
questionnaire, when contrasted with a random 
sample of industrial personnel. It is concluded 
from data that this form is a .valid 
device for 


these 
neuroticism. 
The form used here is not available for general 


screening indicating 
use, since it has been developed by one com- 
But 
the principles described here have broader im 


pany to aid its interviewing procedures 
plications. ‘These are: (1) that a very short 
personality questionnaire for industrial pur- 
poses can be a valid screening device that will 
indicate neuroticism; and (2) that the use of an 
Objectivity key, similar to the L scale of the 
MMPI, can be a valid device that will aid in 
the interpretation of scores. It is apparent 
that other personality questionnaires can be 
made that will be as valid and as short as this 
one, and hence the restriction of the distribu- 
tion of this form is actually not a serious 
handicap. 
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Overall Job Success as a Basis for Employee Ratings 


C. E. Jurgensen 
Minneapolis Gos Company 


The “halo” effect in merit rating is well 
known, and many different procedures have 
been devised to reduce its effect. In spite of 
such attempts, intercorrelations have remained 
so high as to cast doubt on the value of ratings 
on separate factors. In some cases intercor- 
relations have actually been higher than trait 
reliabilities (4). This has led some investi- 
gators (1, 5, 6) to recommend overall ratings 
variously called ‘‘performance on present job,”’ 
“value to the company,” “overall job suc- 
cess,” etc. 

Overall ratings lack the specific information 
found in trait ratings, but this objection is more 
theoretical than practical if intercorrelations 
between trait ratings are higher than their 
reliabilities. Overall ratings have certain ad- 
vantages over trait ratings. They are apt to 
agree with the foreman’s statements when he 
is questioned regarding an employee, and they 
are apt to agree with promotions, transfers, 
demotions, discharges and other personnel 
moves which are supposedly based on merit. 
Some persons condemn overall ratings on the 
basis that they. are not analytical and may not 
be based on objective evidence. Yet these 
same persons may “‘validate”’ a trait scale on 
the basis of overall ratings or on promotions 
based on overall judgments. It would appear, 
then, that many persons believe that overall 
ratings are more valid than other means of 
determining employee merit and that some 
persons who disagree with this viewpoint in 
theory accept it in practice. 

This report discusses two types of employee 
ratings based on overall job success. 


Part I. Rank Order Merit Ratings 


The rank order rating technique has re- 
ceived comparatively little attention in the lit- 
erature of industrial psychology. The method 
consists of ranking employees on a specified 
continuum from best to worst. In the cases 
described here the continuum consisted of over- 


all merit. The name of each employee was 
typed on a 3’’X 5" index card. The pack of 
name cards was given the rater with the infor- 
mation that names were in no particular order. 
Simultaneously, the pack of name cards was 
shuffled. Instructions were: ‘Please arrange 
these names in order from best to worst. 
Your best employee should be placed on top 
and the worst on the bottom. You can start 
from either end, or start from both ends and 
work toward the middle. You can make as 
many changes as you wish.” 

Rank orders were subsequently converted to 
scale scores advocated by Hull (3) and Guilford 
(2). Intercorrelation between the two types of 
scale scores was .998 with a total of 115 cases. 
Only the Hull method was subsequently used 
and reported here. 


Case 1. Three foreladies ranked forty-four 
Inspector-Packers on an overall basis. Cor- 
relations between the raters were .88, .80, and 
.76. Stepped up reliability for the sum of the 
three ratings was .93. 

Case 2. Three foreladies ranked twelve In- 
spector-Packers on an overall basis. Correla- 
tions between ratets were .89, .77, and .72. 
Stepped up reliability was .92. 

Case 3. Five foreladies ranked twenty- 
three Inspector-Packers on a overall basis. 
One month later the ranking was repeated 
without reference to the prior rankings. Cor- 
relations between the first and second rankings 
obtained for each forelady were .89, .88, .86, 
.76, and .73. Stepped up reliabilities for the 
sum of the two rankings by each forelady were 
.94, .94, .93, .86, and .85. 

Intercorrelations were computed between 
the five raters. These are given in Table 1, 
Four of the five correlations lower than .80 in- 
volved forelady D who had been a forelady for 
less than one month and had had no previous 
rating experience or training. She was sub- 
sequently (and unrelated to these or other 
ratings) demoted to a non-supervisory position. 
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Table 1 
Incercorrelations Between Five Foreladies Ranking 


Twenty-Three Inspector-Packers 


B 


Forelady B likewise had less than one month 
experience, but was subsequently considered 
a “good”’ forelady. 

Case 4. Three inexperienced _foreladies 
ranked thirty-four Inspector-Packers on an 
overall basis. Each forelady had been in 
supervisory work less than one month, and 
none had received any training in rating pur- 
Intercorrelations _be- 
tween their ratings were .54, .53, and .19. 

Case 5. An Assistant Sales Manager ranked 
fifty-two Salesmen on an overall basis of “‘value 
to the company asa salesman.” To facilitate 
the ranking of a number as large as fifty-two, 
he was first asked to sort the name cards into 
three piles (‘‘best,” ‘‘average,”’ and ‘‘worst’’) 
containing approximately the same number of 
names The groups then 
placed in rank order and merged. Changes of 
opinion in rank order were permitted until the 
judge was completely satisfied with his rank 


poses or procedures. 


each three were 


One month later the process was re- 
The Pearsonian correlation between 


order. 
peated 
the repeated ratings (scaled scores) was .94. 
Stepped up reliability for the sum of the two 


ratings was .97, 


Rating reliability of this magnitude is so 
high as to be suspect. No evidence was found 
to indicate it was spuriously high or that 
similar results could not be obtained by other 
persons under the conditions. It is 
believed that the high reliability obtained was 
due primarily to four factors: (1) Rank order 
merit ratings on an overall basis may inherently 
be more reliable than more frequently used 
procedures; (2) The rater was an exceedingly 
conscientious person and was highly motivated 
to be as accurate as possible in his ratings; (3) 
During the previous year the rater had re- 


same 


ceived almost twenty hours of individual 
training in rating purposes, principles, and 
procedures; and (4) The rater was well ac- 
quainted with his subordinates, having worked 
intimately with them for several years 


Conclusions for Part I 


1. Ratings obtained from experienced super- 
visors were more reliable than those obtained 
from inexperienced supervisors. 

2. Highest reliability was obtained from a 
supervisor who was highly motivated, had 
received individual training in rating, and was 
thoroughly familiar with the work of his sub- 
ordinates. 

3. Rank order ratings on an overall basis are 
simple to obtain and can have a high degree of 
reliability. 


Part Il. Multiple Item Scale for 
Rating Overall Job Success 


The simple rank order method just described 
will not always be suitable. It requires that 
ratings be made for a group of employees. 
Sometimes a single employee must be rated, as 
when determining action to be taken on a 
probationary employee. Although rank order 
rating is not applicable, an overall rating may 
be desired. However, it is somewhat disturb 
ing to base a decision on a single overall rating 
in view of the commonly accepted hypothesis 
that, other factors being equal, the reliability 
of a measurement is related to the number of 
items making up the measurement. 

This section deals with an attempt to devise 
a four itern scale in which each item consists of 
an overall rating of job success. 
permits computation of split-half reliabilities 
for groups of employees and internal con- 
sistency for a single employee. 

In developing a multiple item rating scale 
for overall job success it was considered im 
portant: (1) to word items in such way that 
their identicalness was not 
provide a varying number of rating categories 
so that raters would not automatically check 
the same category in each item; and (3) to 
provide two open end questions on each em- 
ployee’s strong points and weak points. A 
copy of such a scale is given in Figure 1 


Such a scale 


obvious; (2) to 
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mployee evaluation report including the composite normalized score weights 


basec 


Procedure. ] wenty -one supery isors repre- 


senting all divisions of the company filled in 405 
Employee Evaluation Reports. The largest 
number of employees rated by a single super- 
visor was thirty-seven and the smallest number 


was four 
Ratings on each item were converted into 


normalized standard scores using the mid-point 


percentile method. These scores were ex 


on total of 810 ratings 


pressed with a mean of 50 and a sigma of 10 


and comprise the weights given each rating 
One month after the first ratings were obtained, 
the same supervisors again rated the same em 
ployees. Standard score weights were com- 
puted as previously 

The normalized score weights for the first 
ratings of 405 employees were compared with 


those of the repeated ratings. The mean dif- 
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ference was less than .3. Figure 1 includes the 
composite normalized score weights based on 
the total N of 810. These weights apply 
strictly only to the raters and employees in- 
cluded in the study. The method of selecting 
raters and employees permit generalization 
throughout the company concerned, but 
weights cannot be assumed to be applicable 
to other companies. 

Scale Reliability. Inasmuch as weights are 
expressed in terms of standard scores, weights 
of two or more items can be added directly. 
This permits computation of split half reli- 
ability where two items comprise one half and 
the remaining two items the other half. The 
reliability (stepped up, as usual, by the 
Spearman-Brown prophecy formula) was .94 
for the first set of ratings (N=405) and like- 
wise .94 for the second set of ratings (N = 405). 

Twelve supervisors each rated seventeen or 
more employees. Split half reliability was 
computed for each individual supervisor. 
These reliabilities ranged from .82 to .99 with 
a median of .96 for the first set of ratings. For 
the repeated set of ratings these reliabilities 
ranged from .87 to .98 with a median of .94. 

Repeat reliability was .88 for the total of 405 
employees. For individual supervisors who 


rated seventeen or more employees, the reli- 
abilities ranged from .78 to .98 with a median 
of 92. 

Four supervisors each rated one group of 36 
employees, thus providing six intercorrelations. 
For the first set of ratings they ranged from .60 


to .83 with a median of .71. For the repeated 
ratings they ranged from .67 to .84 with a 
median of .76. 

The above reliabilities follow the usual 
pattern of highest reliability for split halves, 
next highest for repeated ratings, and lowest 
for ratings by different supervisors. All reli- 
abilities appear high as compared with reli- 
abilities typically obtained from ratings. 
This is particularly significant in view of the 
fact that no supervisor was given any training 
whatsoever in the use of this scale, and few, if 
any, had received training in the use of any 
rating scale. This procedure was deliberately 
followed to avoid possible difficulties resulting 
from training on an experimental scale which 
might subsequently be discarded. The as- 
sumption was made that if the scale was found 
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to be satisfactory without any training it 
would be even more so with training. 

Item Reliability. Reliability of the scale as 
a whole is dependent on the reliability of each 
item in the scale. Item reliabilities were there 
fore computed for the 405 repeated ratings by 
means of contingency coefficients corrected for 
errors of grouping due to small number of 
classes. The coefficients reported here can be 
considered equivalent to the customary Pear- 
sonian correlation. Repeat reliabilities of the 
four items (N=405) are: (a) .87; (b) .80; (« 
82; and (d) .90. 

Item Consistency. Item reliability can also 
be interpreted on the basis of item consistency 

The percentage of employees (N=405) 
given identical ratings on two occasions on 
each of the items was: (a) 77.0%; (b) 64.9%; 
(c) 61.5%; and (d) 81.0%. Very few dis 
crepancies greater than one step were found 
for any item, such discrepancies being: (a) 
1.0%; (b) 2.0%; (c) 2.2%; and (d) .7%. Be 
cause of the varying number of steps in the 
scale items the above perce tages are not 
strictly comparable. An index was therefore 
devised to express the number and extent of 
inconsistencies, in relation to the maximum 
spread.' 

The item indexes for repeated ratings 
(N= 405) were: (a) .95; (b) .91; (c) .93; and 
(d) .93. Item indexes for ratings by different 
raters (N= 432) based on all possible pairings 
of 36 employees rated by four supervisors 
were: (a) .94; (b) .88; (c) .92; and (d) .89. 
Item consistency appears satisfactory, and no 
one item stands out as appreciably superior 
or inferior to any other item.” 

Item Intercorrelations. The four items com- 


' The formula is: 
_ Nit 2N2 + 3Ni --- 
DN (0414943 --- a) 


CLs + aN. 





where N = number of ratings with the step discrep- 
ancy indicated by the subscript, and n = maximum 
step discrepancy. The formula is at a maximum of 
1.00 when ratings are completely consistent and at a 
minimum of .00 when each employee is rated as high as 
possible at one time and as low as possible at another 
time. 

*Item b contained the phrase “next 20%” in two 
places. This proved deceptive to raters, and appears 
to have lowered the reliability and consistency of this 
item. It has subsequently been reworded to read “next 
lowest 20%” and “next highest 20%,” and it is believed 
that the reliability of the item will be improved con- 
siderably. 
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prising the scale were intended to he four 
ratings of the same characteristic, namely an 
overall rating of job success. Split half reli- 
ability was found to be high, thus indicating 
that the two halves were equivalent. Further 
evidence on this point is obtained from inter- 
correlations of the four items. These range 
from .82 to .90 thus being essentially the same, 
and essentially the same magnitude as the 
repeat reliability of each item. 


Conclusions for Part II 


1. Without receiving any training in the use 
of the scale, twenty-one supervisors rated 405 
employees on two occasions on a four item 
overall merit rating scale. Under these cir- 
cumstances split half reliability was .94, repeat 
reliability was .88 and correlation of ratings 
by different supervisors ranged from .60 to 
.84 with n’s of 36 each. 

2. Item reliabilities and consistency indexes 
were satisfactory for each of the four items. 


3. Item intercorrelations were of essentially 
the same magnitude as repeat reliabilities. 

4. It is reasonable to believe that the results 
would be even more favorable if supervisors 
were trained in the use of the rating scale. It 
is therefore concluded that the technique of 
multiple overall ratings is a promising and 
practical technique. 
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A Comparison of the Terman-Miles M-F Test and the 
Mf Scale of the MMPI 


Olga E. de Cillis and William D. Orbison 


University of Connecticut 


Vocational counseling presents many occa- 
sions to use both the Terman-Miles Attitude- 
Interest Analysis Test and the Minnesota 
Multiphasic Personality Inventory.' The use 
of these tests, however, has frequently revealed 
marked discrepancies in scores on the mascu- 
linity-femininity scales. These discrepancies, 
like the low correlations reported in the litera- 
ture for other masculinity-femininity scales (10, 
12), suggest the hypothesis that although the 
T-M and the MMPI tests may discriminate 
between the sexes, they are nevertheless mea- 
suring different aspects of masculinity-femin- 
inity. An opportunity? to test this hypothesis 
on a large group was provided when both tests 
were incorporated in a battery of tests being 
used provisionally in the School of Business 
Administration at The University of Con- 
necticut 


Subjects 


The subjects used in this study were 129 
men and 50 women undergraduate students at 
The 129 men 
were junior and senior students from the School 
of Business Administration enrolled in a 


The University of Connecticut. 


course 
They ranged in age 
The 


50 women were freshmen and sophomores in 


in industrial psychology. 
from 20 to 31; the median age was 24. 


the College of Arts and Sciences enrolled in an 
introductory course in psychology. Their ages 
ranged from 17 to 25; 


women was 18. 


the median age of the 


Procedure 


The tests used were the T-M, Form B and 
the group form of the MMPI. 
tration took place during class periods with a 
lapse of one to two weeks between each admin 
istration. 


lest adminis 


Presentation of the tests was varied 


! Hereafter the Terman- Miles Attitude-Interest Analy 
sis Test will be referred to as T M; the Minnesota 
Multiphasic Personality Inventory as MMPI 

* Kindly provided by Dean Laurence J. Ackerman 


in AB, BA order. The nature and purpose of 
the study were concealed in order to avoid the 
falsification to which masculinity-femininity 
tests are particularly susceptible (Kelly, Miles 
and Terman (11), Gough (6), Benton (1), and 
Meeh! and Hathaway (15)). 

The MMPI was machine scored; the T-M 
hand scored. Since the chances for 
making errors in hand scoring the T-M are 
great, each test was hand scored by two differ- 
ent individuals. Errors revealed by differ 
ences in the two scorings were corrected. 


was 


Results 


MMPI. The men in this sample obtain a 
score somewhat more feminine than that re 
ported by Schmidt (17) for men in a non-college 
population (M=18.0) more 
feminine also than the mean for the normed 


and somewhat 


population. The manual reports a mean raw 
score of 20.5 (9, p. 11) with a standard devia 
tion of 5.0; the present study has yielded a 
mean raw score of 24.6, standard deviation 4.86 
(Table 1). The fact that the men in this study 
obtain a feminine 
pected since it has been demonstrated that 


more score is to be ex 
college men score above the published norms 
(Brown (2), Hathaway’ (2 

The data for women, as far as mean raw 
score and dispersion are concerned, compare 
more favorably with the norms. The manual 
cites a mean raw score of 36.5 (9, p. 12) with 
a standard deviation of 5.0. In this study 
a mean raw score of 37.1 and a standard devia 
rhe data of this 
investigation accord with the norms as well as 
with the results of Verniaud (20) on a non- 
college population and with Brown (2), Hamp 
ton (8), Lough (12), and Nance (16) on college 
populations 


T-M. 


tion of 4.30 were obtained. 


The mean raw scores and standard 


ation in Brown (2, p 


* Cited from personal communi 
) 
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Comparison of Terman-Miles M-F Test and M{ Scale of MMPI 


Table 1 

Means and Standard Deviations of M-F Test Scores 
Mean 

Raw 


scores 


Standard 
Test Deviations 


MMPI* Men 24.6 4.86 
Womer 37.1 4.30 
Men 73.7 43.96 


Wome 5 32.6 42.45 


r-Mf 


*For the MMPI larger raw mean greater 
femininity for both sexes. The manual provides a set 
of norms for each sex in which larger T scores indicate a 
deviation in the direction of the opposite sex pattern 

t For the T-M test, larger positive raw scores mean 
greater masculinity; larger negative raw scores mean 
greater femininity. In interpreting the percentile scores 
the larger the percentile for men the greater the mas- 
culinity; the larger the percentile score for women, the 
greater the masculinity 


scores 


deviations for men and women on the T-M are 
presented in Table 1. Judged by norms pre- 
sented in the manual (18, p. 8), the males are 
fairly typical. A mean of 67.4 and standard 
deviation of 47.65 reported for 130 college men 
on Form A compare well with the obtained 
mean of 73.7 and standard deviation of 43.96 


of this study. 
However, the women in this study prove 
considerably more masculine than are similar 


groups described in the manual (18, p.5). The 
mean for college women on Form A is —60.8 
with a standard deviation of 39.15 (18, p. 8). 
When both forms are averaged the median 
score is — 65 (18, p. 13); the standard deviation 
which may be estimated from that table is 
44.15. In this study a mean of — 32.6 with a 
standard deviation of 42.45 obtained 
(Table 1). Evaluated by the T-M manual, 


was 
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the women of this investigation score at the 
level of M.D. or Ph.D. women. 

Because the mean for women cited in the 
manual (18, p. 8) is not comparable with the 
mean of this study, a ¢ test was computed. 
The obtained ¢ value is 4.08 for 178 degrees of 
freedom. ‘This difference is reliable beyond 
the .01 per cent level. Thus in regard to the 
T-M, the mean obtained for women in this 
study is significantly lower, hence more mascu 
line than that of the normed population. The 
standard deviations, however, are about the 
The F ratio of the two variances is 1.17 
which is not significant. 


Same. 


Differences between Men and Women on the 
T-M and MMPI. The differences and their 
significances for each test as a whole are shown 
in Table 2. All are significant beyond the .01 
per cent level of confidence. No less signifi- 
cant are differences between the sexes on the 
separate exercises on the T-M.* In each in- 
stance the result justifies the assertion that 
men and women score differently on these tests 
of masculinity-femininity—a finding that con- 
forms with the literature (3, 4, 5, 19). 

Correlations. Thus it has been established 
that both tests are measuring differences be- 
tween the sexes (Table 2). Yet the findings in 
the literature concerned with comparisons of 
other masculinity-femininity scales as well as 
observations made by the authors prior to this 
investigation led to the hypothesis that the 
T-M and the MMPI, although measuring some 
aspects of masculinity-femininity, do not mea- 

‘In this analysis as well as those that follow, exer 
cises 2 and 7 of the T-M are omitted since there was 
almost no variability for either group. Attention is 
also called to the fact that these exercises are the least 
relial of the T-M test (19, p. & 


Table 2 


Differences between Men and Women on the T-M and MMPI 


Group T-M MMPI 


Men 
Women 


73.7 24.64 - 
32.6 37.10 
Difference + 106.33 12.46 4 
dD 7.15 74 
14.90 ~16.82 


r-M, 


4.25 
12.16 
8.21 
1.36 
6.04 


Means 


T-M; T-My, r-M, 
+5.18 
4.54 
+-9.72 
1,21 
R03 


+ 30.34 
+12.24 
+18.10 
3.42 
5.29 


+ 40.40 
19.00 
+ 59.40 
5.22 
11.38 


P All values are much less than .001 
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Table 3 


Correlations between Tests of M-F; Correlations between MMPI and Subtests of T-M 


Group rxyyt 


rey 


Men 
Women 
Difference 


— .29§*** 
— .365** 


— 070 


+.134 
+ .168 


* Significant at the 5% level of confidence. 

** Significant at the 1% level of confidence. 

*** Significant at the .1% level of confidence. 

t The subscripts refer to the sub-tests of the T-M. 
the total Mf scale of the MMPT. 


The confidence limits for 129 cases are: 5% level = .172; 1% level = .204; .01% level = .285 


— 034 


Subtests 1, 3, 4, 5, and 6 of the T-M 


rx rey 


—.214** — 315°** + .078 
— .108 —.213 ~- 213 
+ .106 +.102 -.291 


rxsy rXey 


— 100 
— 305* 
— .205 


were correlated with 


For 129 cases 


the confidence limits are: 5% level = .279; 1% level = .361 


sure the same aspects to any great extent. 
This hypothesis has been verified by the low 
correlations of the present study (Table 3). 
For men the correlation between T-M and 
MMPI is —.30; for women it is —.37. Both 


correlations are negative, as should be expected 
in correlating raw scores, since a high T-M raw 
score means masculinity and a high MMPI 
raw score means femininity. 

The significance of the departures of the 
regressions from linearity was tested by the 
technique of analysis of variance (14, 255-62) 


for two regressions: the regression of MMPI 
scores on the total T-M for the male group, and 
the regression of MMPI scores on the total 
T-M scores for the female group. In neither 
group does the appropriate / ratio approach 
significance. For these groups, it can be con- 
cluded, there is no significant departure from 
linearity. The other regressions were judged 
by inspection to be linear. 

The correlations obtained in this study are 
comparable in magnitude with those cited in 
the literature for other masculinity-femininity 
scales. Nance (16) compared the masculinity- 
femininity scales of the Guilford-Martin, 
Strongand MMPI. The correlations obtained 
averaged +.40 for men and +.21 for women. 

It may further be assumed that the several 
exercises comprising the T-M may them- 
selves measure somewhat different aspects 
of masculinity-femininity. Therefore, an at- 
tempt should be made to determine what 
variables each of the separate exercises of the 
T-M has in common with the MMPI. This 
has been done by correlating exercises 1, 3, 4, 5, 
and 6 of the T-M with the MMPI (Table 3). 


For women it is exercise three (information) 
which correlates most highly with the MMPI. 
For the male group exercises four (emotional 
and ethical response) and five (interests) 
yield the highest correlations. The correla- 
tions thus obtained have also been tested for 
significance by applying R. A. Fisher’s z trans- 
formation (14, 123-24). The correlations 
which attain each level of confidence have been 
appropriately indicated in Table 3. Only the 
results of exercises 3, 4, and 5 will be of concern 
in the following analysis since the correlations 
for the other exercises do not differ signifi- 
cantly from zero. 

The highest correlation for women, ob- 
tained between exercise three (information) 
and the MMPI, the authors are at a loss to 
explain, sincc the MMPI did not derive any 
items from this exercise of the T-M (Table 
4). This correlation, however, is significant 
at only the 5% level of confidence and the 
difference from zero may have occurred by 
chance. For men, the two exercises on the 
T-M that correlate most highly with the 
MMPI are exercises four and five. As far as 
exercise five is concerned, this finding is under- 
standable since the bulk of the items that were 
used in the MMPI were derived from this 
exercise (Table 4). 

Hathaway and McKinley report (9, p. 5), 
in discussing the choice of items for the Mf 
scale, that “some were inspired by Terman 
and Miles, and others are original.” A total 
of 43 items on the T-M were judged to be 
identical or similar to 31 items on the MMPI 
(Table 4). The following items may serve to 
illustrate this agreement: (1) T-M, “Are your 
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Table 4 


Analysis by Subtests of Items on the T-M 
that Occur on the MMPI 


No. of Identical 
or Similar Items 
Exercise on T-M* on MMPI** 


. Emotional and Ethical Response 
5. Interests 
Personalities and Opinions 
. Introvertive Response 


* Exercises 1-3 inclusive on the T-M have been 
omitted from this table since they did not contribute 
any items to the MMPI, 

** In this category are included either identical items 
that occurred in Form A of the T-M or items that were 
similar in content but not in format that occurred in 
Form B. 


feelings often badly hurt?”; MMPI, “My 
feelings are not easily hurt”; (2) T-M, “There is 
plenty of proof that life continues after death”’; 
MMPI, “I believe in a life hereafter’’; (3) T-M, 
“Were you ever fond of playing with snakes?”’; 
MMPI, “Ido not have a great fear of snakes.” 

Of the 43 items on the MMPI derived from 
the T-M, 25 are from exercise five (interests). 
On the basis of ‘‘“common elements,” therefore, 
it is to be expected that the correlations with 
exercise five would be highest. Referring to 
Table 3 it will be noted that this correlation is 
highest for the male group but not for the 
female group. However, the magnitudes of 
the correlations for either sex are about the 
same, both are in the same direction, and they 
do not differ significantly when tested. 


Discussion 


It is evident that the T-M and the Mf 
scale of the MMPI do not show a high correla- 
tion. The correlations are significantly dif- 
ferent from zero, however, a result which does 
indicate that the two tests are measuring some- 
thing incommon. On the other hand the cor- 
relations are low enough to suggest that the 
overlap is not great—a finding susbtantiated 
in the correlational study made by Nance (16) 
with other tests of masculinity-femininity. 
Much of what the two tests have in common 
may be ascribed to exercises four and five. 

Although the correlations between the two 
tests are not high and do not appear to be 
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measuring the same thing, the highly signiti- 
cant differences between men and women ob- 
tained would indicate that both tests are 
validly measuring some differentiae of the sexes. 
These findings are understandable if mascu- 
linity-femininity comprises not one but many 
dimensions. Accordingly, the tests used in 
this investigation would be measuring some- 
what different dimensions. The possibility of 
several masculinity-femininity factors has al 
ready been indicated, although not conclu- 
sively, by Guilford and Guilford (7), and 
Martin (13). A factor analysis of the T-M 
test determining the common facters might 
prove more fruitful. Until some such deter- 
mination of what these scales are measuring, 
caution should be used in interpreting these 
results in counseling. 


Summary 


Fifty female and 129 male undergraduate 
students at the University of Connecticut 
were given both the Terman-Miles M-F Test 
and the Mf scale of the MMPI. Both tests 
clearly differentiated between the sexes but did 
not correlate very highly with each other. 
It is proposed that both tests are validly 
measuring different aspects of masculinity- 
femininity interests and attitudes. A factorial 
analysis of the Terman-Miles M-F test is sug 
gested. Caution in using the scales in coun- 
seling situations is recommended 
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A Follow-Up Study on Satisfaction with Nursing 


Helen Nahm 


Duke University 


During the Spring of 1947 a study! of the 
satisfaction of three groups of students was 
made at the Duke University School of Nurs- 
ing. The groups consisted of 70 seniors (class 
of 1947), 62 juniors (class of 1948), and 52 fresh- 
men (class of 1949). To measure satisfaction 
an adaptation of the Hoppock Job Satisfaction 
Scale® was used. To measure factors associ- 
ated with satisfaction students were asked to 
respond to a number of questionnaire items 
designed to obtain information about their re- 
actions to the situation in the school of nursing. 

Findings of the study indicated that the 
freshmen students, at the end of a period of 
nine months in the school, were an enthusiastic 
and highly motivated group. Junior and 
senior students, on the other hand, were much 
less satisfied and showed many evidences of 
tension and frustration. Responses of the 
latter groups to questionnaire items indicated 
a genuine concern about the lack of time to 
give satisfactory care to patients; as well as to 
study, sleep, and participate in social and 
recreational activities. When asked to give 
suggestions for improvement these students 
recommended shorter and better-planned hours 
of work; the employment of a larger number of 
staff nurses so that patients might be given 
better care; more competent head nurses and 
supervisors; and a counselor to assist students 
with their personal, social, and emotional 
problems. They felt that courses should be 
made more interesting, and that methods of 
evaluating progress of students in their learning 
experiences on hospital divisions should be 
improved. 

In the year following the publication of 
findings of this study a number of changes were 
made at the Duke University School of Nurs- 
ing. Hours per week of class work and clinical 
experience on hospital divisions were reduced 
from 48 to 44 

' Nahm, Helen. Satisfaction with nursing 
Psychol., 1948, 32, 335-343 
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number of more highly qualified individuals 
were appointed to the faculty. Head nurses 
and supervisors were encouraged to take 
courses in ward management and teaching, and 
personnel work in schools of nursing. A more 
diversified program of social and recreational 
activities was provided. Every possible effort 
was made to create an environment in the 
conducive to the welfare, 
Faculty members 
leaders and 


nurses’ residence 
and happiness of students. 
worked with individual student 
with representative student groups to gain a 
better understanding of student needs and 
problems, as well as to give active assistance 
in planning for and making changes which 
seemed indicated. 


Follow-up Study 


! 
To determine whether there is a relationship 


between environmental changes in a school of 
nursing and attitudes of students toward their 
experiences in the school, follow-up studies 
made of the freshman students who 
participated in the original study (class of 
1949) and also of the group of students who 
entered the school of nursing during the fall of 
1947 (class of 1950). The Nursing Satisfac 
tion Blank was administered to the two groups 
during the Spring of 1948, and again during 
the Spring of 1949. 

Changes in satisfaction of the class of 1949 
from the Spring of 1947 to the Spring of 1949 
are given in Table 1, and changes of the class 
of 1950 from the Spring of 1948 to the Spring of 
1949 in Table 2. Changes in mean scores and 
standard deviations for the two groups are 
given in Table 3. 

For the class of 1949 the correlations between 


were 


satisfaction scores for successive years are as 
follows: (a) Between 1947 1948 scores 
=(0.54; (b) Between 1948 1949 scores 
0.53: and (c) Between 1947 and 1949 scores 
0.41, 
The differences between means of 1947 and 
1948 scores ({=5.59) and 1947 and 1949 scores 


and 


and 
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Table 1 


Changes in Satisfaction with Nursing from 1947 to 1949 of Students Enrolled in the Class of 1949 


} 
| 


Doesn’t like it | Indifferent 


1947 


Enthusiastic 


Likes it 


1949 


Likes it Enthusiastic 











Indifferent 








Doesn’t like it 





bicstisitiataad 


Total | 0 
Per cent 


Table 2 


Changes in Satisfaction with Nursing from 1948 to 1949 of Students Enrolled in the Class of 1950 


Doesn’t like it Indifferent 


1948 


Enthusiastic 


Likes it 


1949 


Likes it Enthusiastic Total Per Cent 











Indifferent 








Doesn't like it 





Total 
Per cent 








Table 3 


Changes in Mean Scores and Standard Deviations on 
the Nursing Satisfaction Scale of Two Groups 
of Students of the Duke University 
School of Nursing 


Class of 1949 Class of 1950 
(N = 45) (N == 40) 
S.D. of S.D. of 
Dist. Dist 


Mean Mean 


Spring 1947 3 1.96 
Spring 1948 2.44 
Spring 1949 2.44 


(¢=3.80) are significant at the 1 per cent level 
(in the direction of less satisfaction). The 
difference between 1948 and 
1949 scores is not significant 
(t= 1,30). 

For the class of 1950 the correlation between 
1948 and 1949 mean scores is 0.54. The differ- 
ence between means is not statistically signifi- 
cant. From these findings it seems evident 
that, for the class of 1949, there was a signifi- 
cant decrease in satisfaction from the first to 
the second year, and no significant increase 
from the second to the third year. For the 
‘lass of 1950 the slight decrease in satisfaction 


means of 
statistically 
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from the first to the second year is not statisti- 
cally significant. 


Factors Associated With Satisfaction 


Differences in percentages of students of the 
class of 1949 responding to various question- 
naire items designed to discover factors associ- 
ated with satisfaction in nursing indicated the 
changes which had taken place since the time 
of admission to the school. Percentage differ- 


ences (1947 and 1949) which are significant 
at the 1 per cent level indicate that students: 


were more likely to enjoy working with 
doctors and to feel that doctors approved 
of their work; 

more often had opportunities to express their 
ideas on hospital divisions; 

were less likely to enjoy living in the nurses’ 
residence; and 

were less likely to enjoy formal classes. 


Percentage differences which are significant 
at the 5 per cent level indicate that students: 


more often had opportunities to use their 
initiative on hospital divisions; 

were less likely to feel that the teaching pro 
gram on hospital divisions was adequate; 
and 

were less likely to feel they lacked social 
skills feel at 


situations 


needed to ease in social 


Attitude Changes 


For the class of 1949 (admitted in September 
1946) typical statements which indicate atti- 
tude changes which took place after admission 
to the school are given as follows: 


I like nursing better, now that I have had 
more experience. 

I have a better understanding of the profes- 
sion and its responsibilities. 

I feel more self-assurance when I care for 
sick people. 

I understand patients and others better. 

The thrill has worn off. 

I realize the many things a nurse needs to 
know in order to be proficient. 

I am realistic 
about nursing. 


more less sentimental 


and 


I have become more hardened to illness. 


For the class of 1950 (admitted in September 
1947) typical statements which indicate atti- 
tude changes are given as follows: 


I am more interested and enthusiastic than 
during the first year. 

I am less idealistic than when I entered. 

I have developed greater understanding of 
people. 

I derive deep satisfaction from the work. 

I didn’t realize it involved so much work and 
study. 

I have matured during the past year. 

I have gained more self-assurance. 

I am better able to accept responsibility. 

I like it, but am not as enthusiastic as when 
I entered. 


Eighty-five per cent of the class of 1949 and 
87 per cent of the class of 1950 said that, if it 
were possible to go back a few years, they 
would again enter the Duke University School 
of Nursing. The remainder of the students 
either were undecided, or said they would 
probably enter a school nearer hore. 


Future Plans of Students 


Eighty-one per cent of the class of 1949 said 
they would prefer institutional nursing after 
graduation. Fifty-two per cent said they 
would like to do general staff nursing and the 
remainder either teaching, administration, or 
Sixty per cent felt they would 
need additional preparation in some field of 
nursing after Forty per cent 
believed that experience as a general staff 


supervision. 
graduation. 


nurse was all that was needed. 

When asked what they would like to be 
doing ten years from now, 90 per cent of the 
class of 1949 said they would like to be married 
Only 13 per cent of the group would plan to 
continue in some field of nursing after marriage. 


Summary 


The fact that, for the class of 1949, there was 
a significant decrease in satisfaction from the 
first to the second year without a corresponding 
from the second to the third year 
would seem to indicate that, once students 


increase 


have lost their initial enthusiasm for nursing, 


it is not easily regained. On the other hand, 
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the fact that, for the class of 1950, there was 
no significant decrease in satisfaction with 
nursing from the first to the second year would 
indicate that it is possible for students to retain 
their initial enthusiasm for nursing. 

Responses of students of the class of 1950 as 
to their attitude changes following admission 
to the school seem, in general, more favorable 
than those of the class of 1949. They indicate 
that some of the concomitants of a satisfactory 
nursing program are greate  under- 
standing of self and others, deep personal 
satisfaction from the work itself, and a matur- 
ing of the entire personality structure. Re- 
sponses of both groups indicate that students 
become less idealistic about nursing and more 


SC hool 


realistic as they progress through the school. 
This is probably both inevitable and desirable. 

The statements which students make as to 
their attitude changes indicate that only a few 


become actively aware of the great public need 


for the future contribution which they may 
make, The fact that only a small minority 
hope to continue in nursing after marriage 
would tend to support this conclusion 


From both initial and follow-up studies of 
satisfaction of students of the Duke University 
School of Nursing the general conclusion may 
be drawn that there is an association between 

hanges in the total environmental situation 
and the extent of satisfaction with nursing. 
It seems likely, however, that a period of from 
two to three years is required to change atti- 
tudes which are primarily negative to those 
which are predominantly positive. Students 
who have been very dissatisfied continue to be 
suspicious of the motives of individuals re- 
sponsible for a school, even though these indi- 
viduals make changes which are much desired. 
When the more advanced groups of students 
in a school are dissatisfied and unhappy, the 
attitudes of less advanced groups are inevitably 
affected after a period of time. However, as 
the satisfaction of each successive group of 
students increases, the morale of the entire 
student group undergoes a gradual change 
Suspicion gives way to confidence and trust, 
and an atmosphere in which each student may 
grow and develop is finally created. 
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Attitudes of Veterans toward Vocational Guidance Services 


Frederick J. Gaudet, A. Ralph Carli and Leland S. Dennegar 


Stevens Institute 


The problem of the evaluation of guidance 


services is in the forefront of educational and 
psychological thinking today. Certainly, at 
no time in the past has so much money been 
spent on educational and vocational guidance 
nor at any time have so many highly trained 
psychologists and guidance counselors been 
employed in this field. 

The attempts to get a picture of the efficacy 
of this vast program have been very limited. 
There has been only one published study which 
The Central Office of 
the Veterans Administration (VA), however, 
has in its files the results of many attitudinal 
surveys of veterans who have received voca- 
tional guidance in centers under its auspices. 
None of them, however, makes use of an ade- 
quate sampling.” 

A similar study was conducted during this 
same period by the Service Center of the Psy- 
chological Corporation’ This study dealt 
with a non-veteran population and many of 
its questions have been incorporated in the 
questionnaire on which the present study is 
based. items were included for the 
purpose of comparing veteran with non- 
veteran guidance services in an effort to answer 
a question posed by many psychologists as to 
the efficacy of the VA guidance program. 
Doubts have been raised as to the value of a 
program which was so huge that its inception 
was almost an emergency and for which the 
limitation of funds per man implied a certain 
restriction on the qualifications of counselors 
who could be employed. 

The Psychological Corporation had coun- 
seled 1,184 employees in a large industrial 
concern, many of whom would be transferred 


used a control group.' 


These 


Effects of advisement 
Sch. & 


' Dech, A. O., and Reeves, P. 
upon continuation in training under P.L. 346. 
Soc., 1948, 67, 429-431. 
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or released at the termination of hostilities. 
An anonymous questionnaire was circulated to 
all of these employees asking for their evalua- 
tion of, or satisfaction with, the guidance 
process. Replies were received from 685 men 
and women. 

The present study was based on the results 
of a questionnaire sent to 200 veterans who 
had received counseling at the VA Guidance 
Center at Stevens Institute of Technology. 
The survey was designed to get a picture of the 
attitudes of veterans toward the guidance pro- 
gram as a whole, and also to get opinions on 
specific phases of the procedure with a view 
toward improving the services. 

The subjects were selected at random from 
the files of those who had received guidance 
between September 29, 1945 and February 16, 
1946. All of these veterans had visited the 
center at least one year prior to the time the 
survey was started. The questionnaire, along 
with a letter explaining the purpose of the 
study, was mailed to these 200 men, of whom 
81 answered. Those who did not answer the 
first letter were sent a second which produced 
an additional 51 responses, and a member of 
the staff interviewed those who did not answer 
either letter. Altogether 164 replies were 
secured. The remaining 36 did not answer the 
questionnaire for the following reasons: (1) 
Address changed, no forwarding address, 23; 
(2) Living outside of New Jersey, 5; (3) Re- 
enlisted, 2; (4) In VA hospital, 2; (5) Non-ex- 
istent address, 1; (6) Died, 1; (7) Could not re- 
call his impressions, 1; and (8) Refused to 
answer, 1. 

With the exception of the man who refused 
to answer, and perhaps the one whose recol- 
lection was poor, it is apparent that probably 
no selective factor operated to distort the 
answers. 

The following questionnaire was used: 

Authorization No 
(Draw a circle around the word which is your 
answer to each of the questions on this page) 
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1.* As a result of your visit to this center did 
you get a better idea 
a. of your strongest 
no doubtful 
b. of your less strong 
abilities? no doubtful 
Did your results show you had underesti- 
mated your aptitudes and abilities 
a. for particular jobs?.yes no doubtful 
b. in general?. no doubtful 
Did the guidance and counseling you re- 
ceived here 
a. increase your self- 
yes no doubtful 
b. decrease your self- 
confidence?.......yes no doubtful 
Did the guidance and counseling on the 
whole give you a better understanding of 
yourself? . yes no doubtful 
Do you feel that your guidance and coun- 
seling was a worthwhile experience? 
yes no doubtful 
Would you recommend a similar counsel- 
ing center for non-veterans at their own 
expense?.... ....yes no doubtful 
On which floor were you treated best? 
Ist 2nd 3rd 
On which floor were you treated least 
satisfactorily ?. . ..Ist 2nd 3rd 
What part of our job do you think we do 
best? 


What part of our job do you think we do 
poorest? 


We improve our work through criticism 


and suggestions. What can you suggest 
that we can do to change or improve our 
services? 


At what age do you think guidance and 
counseling should be provided? 


he first five items were designed to deter- 
mine the effect of counseling on the veteran’s 
view of himself. Item 6 was directed toward 
obtaining a general reaction to the counseling 
service. As a basis for self-criticism and an 
aid in improving Stevens service, items 7 
through 10 provided an opportunity for evalua- 
tion of specific aspects of the program; con- 
crete suggestions for improvement were re- 
quested in item 11. Item 12 solicited the 
veterans’ opinions regarding the persistent 
rom the 
Some were 


* Items marked with an asterisk were taken 
Psychological Corporation questionnaire 
modified slightly. 
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question “When should one receive guidance?” 
Of course, the correct answer is “continuously,” 
but until our schools and colleges are able to 
furnish educational and vocational guidance, 
this is merely a theoretical answer. 

The median age of the group was 25 years. 
Unfortunately, data on educational levels were 
not available for these men but a study of a 
similar sampling of 200 men who received coun 
seling during the same period indicates that 
the median of the highest grade completed was 
11 with lower and upper quartiles of 10 and 12, 
respectively. 

Although responses were received from 164 
veterans, not all answered every question. 
This failure to answer was particularly frequent 
in b of items 1, 2, and 3, and in items 8 and 
10, in which the veteran thought he had an- 
swered the query in previous questions, or parts 
of questions, or presumably had nothing derog- 
atory to say. 

The answers to the first six items of the 
juestionnaire are indicated in Table 1. In 
general, the results of the Psychological Cor 
poration and Stevens studies are fairly com 
parable. It should be remembered in inter 
preting these data that the Psychological Cor- 
poration survey was based on a 58 per cent 
return and that no study was made of those 
who did not answer. The Stevens data were 
based on a return of 82 per cent, and a follow-up 
which indicated that those who did not respond 
were probably not a selected group. 

The results for part a of item 1 indicate that 
the two centers were similar in acquainting the 
individual with his assets, but part 5 shows 
that the Stevens Center did not stress the 
man’s handicaps or liabilities as much as the 
Psychological Corporation. Whether this em- 
phasis on “‘less strong abilities” is desirable or 
not is a question of the philosophy of guidance, 
namely, whether guidance should give the 
client a picture of both his assets and his 
liabilities, or whether it should stress assets. 
Of course, both should be evaluated by the 
counselor, and the client should be told of his 
liabilities if he mentions ‘educational or voca- 
tional objectives in which he would be handi- 
capped. 

The responses to item 26 indicate that less 
than one-third (30%) had under-estimated 
their abilities in general. However, it is inter- 
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esting to observe that considerably more (44%) 
had under-estimated their aptitudes and abili- 
ties for particular jobs. Again, the responses 
in the studies of the Psychological Corporation 
and Stevens seem to agree. 

Probably as important as obtaining a clear 
picture of his aptitudes and abilities, is the 
effect of guidance on the individual’s confidence 
in himself. The answers to item 3a indicate 
that 73 per cent of the veterans increased their 
self-confidence as a result of going through 
guidance, while only four per cent stated that 
their general self-confidence was lowered. It 
is probable that a good guidance procedure 
would increase the self-confidence of most indi 
viduals but decrease that of a few whose qualifi- 
cations for particular jobs are questionable. 
Whether the “right” ones had their self-confi- 
dence increased or decreased, these data do not 
tell us. The differences. between the two 
groups, veteran and non-veteran, are not 
significant. 

The answers to item 4 were intended to give 
an over-all picture of what the guidance process 
does to an individual in helping him see himself 
realistically. The results of the questionnaires 
of the two centers are again similar; about 
three-quarters felt they had a better under- 
standing of themselves. 

Item 5 is probably the key question in the 
whole questionnaire. The objective of an 
educational and vocational guidance center is 
not primarily to give the individual “a better 
understanding” of himself (item 4), but to give 
him an understanding of himself in relation to 
his educational and vocational future. This 
dual function is probably reflected in the con- 
siderably higher percentage of (95 per cent) 
“yes” answers to item 5 than to those (74 per 
cent) to item 4. The percentages of “yes” 
answers to the two questionnaires, Stevens 
and Psychological Corporation, are almost 
identical. 

Item 6 constitutes another method, a less 
direct one, of getting an over-all evaluation of 
what these veterans thought of the guidance 
program. It is possible that someone who had 
been through guidance would consider it de- 
sirable for himself but would not think it 
desirable or necessary for others. In both 
the Psychological Corporation and Stevens 
studies, the services were free. The question 


349 


asked was whether they would recommend 
it for others who would have to pay. It will 
be observed that in both studies there is a 
decrease in the percentage of “yes” answers 
as compared to item 5, but the difference is not 
large. 

In comparing the two studies it should be 
noted that the Psychological Corporation 
study was based on a population who went 
through the guidance process voluntarily. 
The Stevens study included some veterans (22) 
who came to the guidance center because they 
asked for this service under Public Law (PL) 
346. The majority, however, were PL 16 
cases, disabled veterans (142) who were com- 
pelled to go to a VA center before they were 
permitted to take training under the VA. Of 
course, it is not implied that all disabled vet- 
erans came under compulsion. In fact, their 
disabilities may have caused them to want 
guidance more than the average individual. 

Items 7 and 8 were used in the questionnaire 
in an attempt to “fractionate’’ the favorable or 
unfavorable attitudes shown by the veterans. 
The first floor of the Stevens Guidance Center 
building was staffed by VA personnel; coun- 
seling took place on the second and third 
floors, and testing was done on the third floor. 
One interesting feature of the replies is that 
although the questionnaire did not offer an 
opportunity to answer “‘all’’ to either of these 
questions, 25 per cent wrote in this answer to 
item 7, and only two per cent to item 8. Dis- 
content was more frequently located on the 


Table 1 


Reaction of Veterans to Vocational Guidance 


Per Cent Answering 


No 


Item Yes* Doubtful Reply 


la 76 (76) ; 9 2 
1b 43 (69 15 33 
2a 44 (51) 13 9 
26 30 (27) 3 11 23 
73 (80) Z 13 2 
4 (6 9 27 
74 (30 13 1 
91 (90 5 1 
84 (71 7 x 1 


* Percentages in parentheses are those obtained by 
the Psychological Corporation in its questionnaire. 
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first floor as indicated in the low (5 per cent) 
percentage of “yes” answers to this floor in re- 
sponse to item 7 and the high (28 per cent 
percentage of “yes” to item 8. To determine 
whether the least favorable attitude toward 
the first floor in response to items 7 and 8 had 
influenced the feeling toward the whole guid 
ance procedure, answers to items 7 and 8 were 
correlated with items 4 and 5. The results in 
dicated that there was no inter-relationship.‘ 

Items 9 and 10 were included in the ques- 
tionnaire to evaluate various parts of the 
guidance process, It is interesting that the 
most frequent answer to both questions was 
“all good”-—38 per cent to item 9 and 25 per 
cent to item 10. The next most frequent 
favorable answer to item 9 (29 per cent) and 
the most frequent unfavorable answer to item 
10 (14 per cent) was counseling. Since coun- 
seling might be considered the climax, or the 
part of the guidance program requiring the 
greatest professional skill, the frequency of 
these answers is noteworthy in that they were 
able to recognize its importar,ce to them. The 
other answers to item 9 in order of frequency 
were: testing, “no reply,” interviewing, amount 
of time devoted to the guidance process, and 
VA representatives. The other dissatisfac- 
tions expressed in answers to item 10 in order 
of frequency were: ‘“‘no reply,” amount of time 
devoted to the counseling, testing, VA repre- 
sentatives, lack of placement aids, “nothing 
good,”’ and location of the center. 

Cues as to the above favorable and un- 
favorable evaluations are indicated by the re- 
sponses to item 11. A tabulation of these re 
sponses is presented in Table 2. Certainly, 
these responses give the impression that the 
respondents had taken the 
seriously. No comments 


questionnaire 
appeared to be 
facetious. The chief value of the comments 
has been to make the counselors more keenly 
aware of their need to learn what the counselee 
seeks from guidance. Analysis of the com 


‘A later study of a new group (200) of veterans who 
went through the guidance center when it was staffed 
with different VA personnel indicated that the answers 
to items 7 and 8 were not due to VA procedure 

the personnel carrying out these procedure 
answers to these items for “‘first floor’? were 23 per ce 


and six per cent, respectively, in the second study 


Table 2 


Veterans’ Suggestions for Improving the 
Guidance Process 


Ine rease length of process: more counseling { 

more tests (7), more school information (4), more 

job information (3), more information on veter 

ans’ benefits (1) 

Decrease length of process: shorter program (17 

fewer tests (7) 

Do placement work: find employment (2), f 

up to see that veterans get jobs (8 

Improve scheduling* 

Prove validity of tests 

Improve introduction to guidance 

Improve personal interest in counselee: take 

more interest (3), less interest (1 

Improve personnel: generally (2), no women (1 
Improve everything 

10. Improve location 

11. Otherst 


*At the time these veterans came to the center 
veterans frequently had to wait five or six weeks be 
tween asking for guidance and the first appointment 

t Each of these was mentioned by only one veteran 
They ranged from “advertise services offered” to “give 


veterans a letter of acceptance to school suggested.” 


ments according to the counselor involved did 
not indicate that the suggestions referred to 
one more than another. 

The answers to item 12 covered a wide 
range. The median age for which counseling 
was considered desirable was 18 years. 


Summary 


Che present study was undertaken to de- 
termine what attitudes toward the guidance 
process were held by 164 veterans who had 
received counseling at Stevens Institute in 1945 
and 1946. In comparison with the data ob- 
tained from a non-veteran group which was 
counseled and later queried in a survey by the 
Psychological Corporation, the results were 
similar. The Stevens study suggests that in 
spite of limitations in time and money the 
veterans believed that they had profited from 
the VA program of educational and vocational 
guidance 
Received July 24, 1950 
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Upper versus Lower Case Copy as a Factor in Typesetting Speed 
for Linotype Trainees * 


Bernard Stern 


State University of Iowa 


The steadily increasing cost of labor in the 
newspaper industry has resulted in a continuing 
search for means and methods to bolster the 
productivity of newspaper employes. Especi- 
ally is this true of mechanical employes. These 
workers, who belong to long established unions, 
are frequently paid more highly than editorial 
workers. 

Thus, this writer was interested in noting in a 
book, How to make type readable, the following 
statement: ‘Material set in all capitals is read 
12 per cent more slowly than material set 
in lower case. Reader preferences are over- 
whelmingly in favor of lower case.” 

Together with that claim by Paterson and 
Tinker was this statement: “It is apparent that 
all-capital text retards speed of reading to a 
striking degree 
found 
this extent.’” 


Few typographical factors 


can be which will retard reading to 


If these claims held true for newspaper 
linotype operators as well as for readers, it 
occurred to this writer that here lay a fertile 
Such workers 


daily read and set in type millions of ems from 


means of boosting productivity. 


wire copy which comes printed in all-capitals 
off thousands of teletype machines in news- 
In addi- 
tion to this, these operators also set approxi- 


paper offices throughout the country. 


mately as much type from reporters’ type- 
written copy which is usually set in lower case 

The problem, stating it more succinctly, is, 
in the light of Paterson and Tinker’s finding, 
can linotype operators set more composition 
in a comparable amount of time from wire 


* The writer would like to express his appreciation 
for the aid of Professor Clayton d’A. Gerken, Psychology 
Department; Professor Leslie G. Moeller, Marshall N 
Heyman, William J. Morrison and Henry Africa, School 
of Journalism, State University of Iowa 

? Paterson, D. G. and Tinker, M. A. How to make 
type readable. New York: Harper & Brothers, 1940, 
p. 28 (obtainable from the authors, University of 
Minnesota) 

2 Op. ci., p 23 


copy (upper case) or from typewritten copy 
(lower case)? 

If linotype operators could set an appreciably 
greater amount of copy from lower case copy 
rather than from upper case copy in a compara- 
ble amount of time, perhaps it would behoove 
the news wire services (Associated Press, 
United Press, International News Service) to 
make an adjustment in their teletype machines 
so that news stories could arrive in newspaper 
offices printed in lower case (caps and lower 
case). Or if this were not possible, it seemed 
likely that publishers could make a savings by 
employing typists to rewrite the all-capital 
wire copy into lower case. 

On the other hand, if linotype operators 
could set more composition from upper case 
wire copy, would it not be worthwhile to have 
reporters turn in their stories typed in all 
capitals? 

An ideal means of testing this proposition 
with linotype operator trainees was available 
to the writer in the Newspaper Production 
Laboratory of the University’s School of 
Journalism. Each semester the laboratory 
trains 12 to 15 student linotype operators. 

It should be emphasized here that the oper 
ators participating in this experiment were 
beginners. At the time this study was con 
ducted they were capable of turning out an 
average of one-half galley of composition an 
hour compared to experienced newspaper 
linotype operators who can set more than twice 
this amount per hour. 

The findings set forth in this study, therefore, 
pertain solely to beginning linotype operators. 
No attempt is made to project the results to 
the work of experienced operators 


Procedure 


This experiment was carried out after the 
student linotype operators had spent about 
twelve weeks in the Newspaper Production 


351 











352 


Laboratory learning how to operate the lino- 
type. Twelve students were used. This group 
was divided in two equal sections 

For this study, the author collected 130 news 
stories which had come over the Associated 
Press teletype in The Daily Iowan, student- 
published daily newspaper. These stories were 
cut into page lengths of 814 inches by 11 inches 
and were retyped in lower case. Every effort 
was made to get the same number of lines per 
page and the same number of words per line in 
the typewritten version as was in the wire (all- 
capital) story. 

The* 130 stories contained approximately 
25,000 words and ‘took up 110 pages. The 
various news stories varied in length from 60 
words to 800 words and included the following 
varieties of news: international, national, 
crime and accidents, financial, reports of 
speeches and meetings, weather and sports. 
A breakdown in the sports category showed it 
included stories about basebal!, golf, tennis, 
trapshooting, swimming and horse racing. 
Box scores and summaries of sports results 
usually set in agate type were omitted from the 
copy set by the linotype trainees. Each story 
was given a “slug’’ or tag to identify it. 


Bernard Stern 


Before the operators began work they were 
cautioned to set the stories exactly as they ap- 
peared on the copy. In the wire or all-capital 
version of the story the capital letter was un- 
derlined so the linotype operator would set 
that letter in a capital. Otherwise there was 
no difference between the two types of material. 

Each of the twelve operators was given an 
opportunity to set the same stories from the 
two types of copy, upper case and lower case. 
If during the first week he worked on the all- 
capitals copy, the second week he set in type 
the same stories which had been retyped in 
lower case. If the operator had worked on 
lower case copy the first week, the ensuing 
week he was given the same stories to set from 
the upper case version. 

A record was kept of the number of lines set 
daily by each operator from upper case or 
from lower case copy. At the end of each 
week these scores were totaled. The “?” test 
for the significance of the difference between 
the two groups of scores was then made.’ 


* The “?” test employed was that for related meas 
ures, i.e., Theorem 4 in Lewis, D., Quantitative methods 
in psychology. Ann Arbor: Edwards Brothers, Inc., 
pp. 194-196 


Table 1 


Operator's Weekly Average Speed, Lines per Minute and Weekly Average Errors, Errors per Line 


Weekly 


Upper Case 
Copy 


Operator’s 
Number 


(1) 


1! 
12 
14 


Mean 
S.D. 
Mean Difference 
S.E. Difference 
t 


Average Speed 
Lower Case 
Coy y 
(3) 


1 
1 
1 
1 
1 
1 
1. 
1 
1 
1 
1 
1 


Weekly Average Errors 
Lower Case 
Copy 
(5) 


Upper Case 
Copy 
4) 


29 030 
74 054 
+4 063 
050 
050 
071 
048 
065 
056 
052 
053 
063 


.045 
021 
043 
034 
062 
024 
040 
190 
058 
.059 
038 


O55 
028 





Upper vs. Lower Case Copy as a Factor in Typesetting S peed 


Three linotype machines were available in 
the Newspaper Production Laboratory for the 
experiment. So that no one operator should 
have the advantage of working on a machine 
which might be easier to operate, the operators 
were rotated on the machines. 

It was believed that as a corollary to deter- 
mining whether linotype operators could set 
more composition from lower case or from 
upper case copy, it would be important to dis- 
cover if the operators made more errors in 
setting upper case material than in setting 
lower case copy. 

Before beginning the experiment, a pilot 
test was run. This was done with a different 
group of twelve linotype operators to see 
whether or not it would be possible to control 
the necessary factors. Another purpose was to 
find out if any “bugs” would arise to hamper 
the project. 

The results of the pilot test were satisfactory 
and proved to be similar to those obtained in 
the experiment itself. 


Results 


Table 1, columns 2 and 3, shows the mean 
speed scores made by each of the twelve 
operators on the two kinds of copy. 

The mean speed of the 12 operators on the 
upper case copy was 1.48 lines per minute, 
while on lower case copy, the mean speed was 
1.52 lines per minute. The range of operators’ 
performances varied from 1.13 to 2.16 lines per 
minute on upper case copy, and from 1.20 to 
1.97 lines per minute on lower case copy. 

Table 1, columns 4 and 5, shows the mean 
error scores made by each of the twelve oper- 
ators on upper case and lower case copy. The 
mean errors of all operators on upper case 
copy was .055 errors per line compared to .056 
errors per line on lower case copy. The range 
ran from .030 to .071 errors per line on upper 
case and from .021 to .190 on lower case copy. 

At the bottom of Table 1, the mean differ- 
ence, standard deviation, standard error of 
difference and the ‘‘?’” test values are shown. 

It was decided that a difference would not 
be considered significant unless the difference 
was within the five percent level. To be 
significant at the five percent level the “?” 
test value would have had to be 2.20. 
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No significant difference was shown in set- 
ting from upper case copy as compared to 
lower case copy. This was true for both speed 
and errors. 

Conclusions drawn from this experiment are 
not presented as definitive proof that the 
findings apply to all operators. The students 
at the time they began this experiment had 
had twelve weeks of training during which 
period they had practiced an average of ten 
hours a week on the linotype.. Only to this 
extent, can the findings of the experiment be 
applied generally and then they can be gen- 
eralized for trainees only. 

Although no attempt is made to project 
these findings, it may be helpful to know, 
simply as a frame of reference, what is expected 
of an experienced newspaper linotype operator. 
Three persons in the field with long experience, 
when consulted, agreed that an experienced 
operator could set between 1,600 and 1,700 
lines of type a day and not make more than 
six errors per galley. 

The speed of composition of an operator 
setting 1,600 lines would be 3.3 lines per minute. 
Figuring that there are 170 lines of type to a 
galley,* an operator making six errors per 
galley would have an error of .035. 

The average speed of all 


the linotype 
trainees on both types of copy (upper case and 


lower case) was 1.50 lines per minute. The 
average errors made by the twelve trainees on 
both types of copy was .0555. Thus it appears 
that the linotype operators participating in 
this experiment were, as a group, slightly less 
than half as fast and made nearly twice as 
many errors aS an experienced newspaper 
linotype operator. 

In concluding, it may be well to mention 
that linotype operators, by the very nature of 
their job, must read carefully every letter and 
symbol of the material they set in type. Asa 
result their reading rate is slowed considerably 
since their job motivates them to stress ac- 
curacy in reading copy, rather than speed or 
comprehension. It is possible that this may 
account in large part for the lack of significant 
difference between the setting of the two differ- 
ent types of copy. Conversations with several 

* The figure 170 lines to the galley is based on the 


assumption the operator is setting 8-point type on an 
84-point slug and the slug is 12 ems wide 
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linotype operators indicated that in setting 
ordinary material, they do not read it letter 
by letter unless such words are foreign to them, 
or such words are technical, complicated or 
names or the spelling is different than that to 
which they are accustomed. 

As was stated previously, in setting copy ex- 
perienced operators are required to set a 
certain amount of lines per day with a certain 
minimum of errors. Thus, in setting type 
there is a premium on both speed and accuracy 
It may be added that it costs approximately 
fifteen cents per line to make a correction 

After the experiment was concluded, the 
participants were interviewed individually in 
an effort to discover which of the two types of 
copy they preferred, upper case or lower case. 


Bernard Stern 


Fight of the twelve unhesitatingly said they 
preferred lower case copy. They claimed it 
was read and marking individual 
letters for capitalization sometimes was con- 
fusing. Two of the twelve were just as un- 
hesitating in stating a preference for upper case. 
They said it was easier to read because the 
large cap letter was imore easily perceived. 
The remaining two said they had no preference 

In matching the individual’s record against 
his performance, it was ascertained that the 
attitude of the operator had little influence on 
what he did. In practically all of the cases, 
the operators sometimes would set the upper 


easier to 


case copy faster one day and the lower case 
the next. 


Received November 14, 1949 





Design Complexity as a Determiner of Visual Attention Among 
Artists and Non-Artists * 


Walter A. Woods and James C. Boudreau 


Pratt Institute 


Observations by artists and art critics are 
frequently concerned with the need to train 
the artist to Zadkin (8) speaks of 
teaching the pupil to see 
guishes between the practiced eye and the 
Roger Fry (4) 
ments: ““Now this specialization of vision goes 
so far that ordinary people have almost no idea 
of what things really look like the moment 
an artist who has looked at nature brings to 
them a clear report of something definitely 
seen by him, they are wildly indignant at its 
untruth to nature.”’ 


Nene” 
Ensor (3) distin- 


“more common eye.”’ 


com- 


These observations are indicative of a rather 
widespread belief that the artist sees “better” 
or in a manner different from the non-artist. 
Preliminary attempts to investigate the visual 
patterns of artists and non-artists are briefly 
reported by Brandt (2) and Buswell (3). As 
yet no evidence has come to light which has a 
direct bearing on or which enables us to arrive 
at any general understanding of the problem. 

The present experiment was designed to 
determine whether it was possible to discover 
differences in the ‘‘seeing”’ processes of artists 
and non-artists, and whether in fact, differences 
do exist. This report, which should be re- 
garded as an exploratory experiment, suggests 
that it is possible to measure sensory differences 
in visual activity and that artists do differ, as 
a group, from non-artists. 


Selection of a Visual Stimulus 


The determination of the stimulus elements 
to be utilized stems from the hypothesis that 
artists will tend to respond differently from 
non-artists to designs of varying complexity; 
that the artist will be more sensitive and the 
non-artist less sensitive to complex designs. 
This hypothesis is derived theoretically from 
the long-established design concept of unity in 
for 
ersen for 


* The authors are indebted to Walter Civardi 
photographic consultation and Eugen H. Pet 


aid in selecting and executing the designs 
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Brooklyn, New 


York 

variety; that is, a satisfying work of art must 
at the same time be sufficiently unified so that 
attention will be centered on the design as a 
whole, yet sufficiently diverse so that interest is 
maintained 
of design principles.) 


(See Graves (5) for a discussion 
Presumably the artist 
who is able to successfully execute a design 
which meets these criteria has command of a 
visual language of higher complexity than the 
non-artist who is unable to execute satisfying 
designs. 

That artists differ from non-artists in their 
grasp of complexities of design units (and 
among themselves as artists) is suggested by 
the attributed to (6): 
“Cubism is no different from any other school 
of painting. 


statement Picasso 
The same principle and the same 
elements are common to all. The fact that 
for a long time cubism has not been understood 
and that even today there are people who can- 
I do 
not read English, an English book is a blank 
book to me. This does not mean that the 


’ 


not see anything in it means nothing. 


English language does not exist. 

This statement illustrates one phase of our 
problem and at the same time raises another; 
what are the elements of design which are 
Is 
Or form or space or subject 
of the frequent 
at abstract art that it 
“confusing’’ and meaningless; or the less fre 
quent, but important criticism that the colors 
dominate the design and thus the meaning must 
be found in the 


representative of “varying complexity”’? 
color an element? 
matter? We 
criticism leveled 


are aware 


1S 


“emotional” stimulus of the 
colors. It is generally understood that ab- 
stract art has some meaning (frequently be 
lieved to be a hidden meaning) and that this 
it) may 
depend upon the visual sophistication of the 


meaning (or the understanding of 


observer. According to some critics (and the 


above comment of Picasso bears this out) 


abstract designs do have symbolic 


meaning 


apart from their purely sensory qualities 
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Thus, an abstract design, as such, is not an 
adequate stimulus for our study since the 
grasp of symbolic content might enhance the 
attention value for the artistically sophisti- 
cated, regardless of sensory qualities. 

The same criticism may be directed to color 
as a criterion. Colors are generally considered 
to have symbolic meaning (see Birren [1 ]) in 
addition to their stimulus value. Past ex- 
periences and associations may determine the 
attention value of any particular color or com- 
bination of colors, apart from such attention 
determining qualities of sensory appeal such 
as chromatic strength, reversal, dramatic 
quality. 

Representational material (pictures of things) 
is equally unsatisfactory since the primary de- 
terminer of attention may be the symbols in 
themselves, apart from simplicity or complexity 
of the design. For example, a religious paint- 
ing might very well attract strong attention 
from artist and non-artist alike if both were 
equally religious while the non-religious artist 
as well as the non-religious layman might 
reject the painting because it lacked satis- 
factory design qualities. 

It became apparent that we were required to 
select, for our stimulus, designs which were 
neither representation nor abstract and which 
were monochromatic—which in fact were non- 
objective and which were black and white. 


The Experiment 


A well established technique for the measure- 
ment of differences in visual sensitivity (dif- 
ferences, actually, in eye movements) is that 
of Brandt (2). His report on the use of the 
Bidimensional camera suggested that this de- 
vice and technique might be suitable for our 
experiment. It was further decided to utilize 
Brandt's plan of dividing the area into four 
stimulus areas. Design Charts A and B were 
executed in accordance with this plan, and 
with the above limiting qualifications: black 
and white and non-objective. The original 
designs were laid out on an area nine by fifteen 


inches, in the same proportions as shown on the 


accompanying «harts. These designs were 
planned, as is apparent in Figure 1, so that they 
proceed in complexity from a single square to 
a cube with the interior exposed, and from a 
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Fic. 1. Designs used in the experiment. 
single broken line to three broken lines inter- 
rupted by diagonals. 

Preliminary experiments were conducted to 
determine whether placement of the var‘ous 
units or areas would influence responses to the 
areas. Designs were rotated, following Brandt 
(2) in each of the four quadrants. It was ap- 
parent that position was not of significant in- 
fluence insofar as individuals were concerned 
in the small sample used (eight individuals) 
so, in view of economy considerations (photo- 
graphing and transcribing costs were pro- 
hibitive), it was decided to eliminate position as 
a control for the balance of the experiment. 

The following groups of twelve subjects each 
were selected for the experiment: For design 
chart A: (1) Third year advertising design 
students; (2) First year (Foundation) art 
students; (3) Secondary school students age 
15-17 taking art courses on Saturday; (4) Ele- 
mentary school students age 12-14 taking art 
classes on Saturday; (5) Third year Electrica! 
Engineering students selected for proficiency 
in mathematics; and (6) Third year Home Eco- 
nomics (foods) students. For design Chart B 
the following groups of twelve students each 


were used: (1) Third year advertising design 














students; (2) First year (Foundation) art 
students; (3) Third year Electrical Engineering 
students selected for proficiency in mathe- 
matics; and (4) Secondary school students age 
15-17 taking art classes on Saturday. 

Although each group originally consisted of 
twelve students (except for the Foundation art 
group which consisted of thirty students) the 
final number of completed records was in all 
instances than twelve. This situation 
came about because of faulty recording for the 
most part due to the use of release positive 
film. Actual numbers are indicated for each 
group in each of the following tables. 

Employing the technique described by 
Brandt, the charts were placed in position on 
the rack of the camera and the subjects (ob- 
servers) were asked to view the designs. 
Chart A was submitted first, to six groups; 
Chart B to four groups as noted above. The 
original intention was to use only Chart A, but 
after the experiment had commenced, it was 
decided to use Chart B as well as Chart A. 
The observer was allowed to look at the chart 
for nine seconds and then was instructed to 
close his eyes. Chart B was then placed in the 
rack and the observer instructed to open his 
eyes. He was allowed eight seconds to ob- 
serve Chart B. Observers were given no in- 
structions as to what to look for or how to ob- 
serve the designs. They were simply in 
structed: “You will be shown some designs. 
Open your eyes when told to do so and close 
them when told to do so.”’ 

Responses of the observers were analyzed 
in terms of the following aspects of visual re 
sponse: (1) Mean time spent by each group in 
observing each area of each chart; (2) Per cent 
of total time spent in viewing each area of each 
chart by each group; (3) Per cent of initial fix- 
ations made in each area by each group; (4) Per 
cent of total eye fixations made in each area by 
each group; and (5) Average number of fixations 
made by each group in each aia. Analysis of 
variance was performed to determine variance 
ratios of mean time spent by each group in 
viewing each area, and to determine level of 
significance of these means. 

Results of the experiment are shown in the 
following tables. It is indicated (Tables 1 and 
4) that art students tend to spend more time 
in viewing the complex areas and less time in 


less 


Design Complexity as a Determiner of Visual Attention 


Table 1 


Per Cent of Time Spent in Viewing Each of Four Areas 
of Design Chart A by Members of the Following Stu- 
dent Groups: (1) Third Year Home Economics (Foods) ; 
(2) Students Age 12-14 Taking Saturday Art Classes; 
(3) Third Year Electrical Engineering Students; (4) 
Students Age 15-17 Taking Saturday Art Classes; 
(5) Students in Advertising Design; and (6) Foundation 
(First Year) Art Students 


Per Cent of Time in Each Area 


Area Area Area Area 
Group N 1 2 3 4 
Home Economics 6 20 24 26 *” 
Saturday Art 
Age 12-14 9 15 17 31 37 
Electrical 
Engineering by 20 22 22 35 
Saturday Art 
Age 15-17 10 20 20 20 39 
Advertising Design 9 12 20 22 46 
Foundation Art 20 i4 15 21 50 


viewing the simple areas of the designs, whereas 
non-art students tend to distribute their time 
more evenly over the four areas. From Table 
3 it will be seen that the between group vari- 
ance exceeds the within group variance in 


Table 2 


Mean Time (in Seconds) Spent in Viewing Each of Four 
Areas of Design Chart A by Members of the Following 
Student Groups: (1) Third Year Home Economics 
(Foods); (2) Students Age 12-14 Taking Saturday Art 
Classes; (3) Third Year Electrical Engineering; (4) Stu 
dents Age 15-17 Taking Saturday Art Classes; (5) 
Advertising Design; and (6) Foundation (First Year 
Art Students 


Mean Time in Each Area 


Area Area Area Area 
Group N 1 2 3 4 ota 

Home Economics 6 1.84 2.16 3.34 2.59 2.25 
Saturday Art 

Age 12-14 6 142 142 3.25 2.94 2.25 
Electrical 

Engineering 6 2.00 194 2.09 3.00 2.25 
Saturday Art 

Age 15-17 6 1.75 167 1.75 SR 2.25 
Advertising 

Design 6 92 1.84 2.00 434 2.25 
Foundation Art 6 1.17 109 194 484 2.25 
Total 36 1.52 1.68 2.22 3.59 2.25 
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Table 3 


Analysis of Variance of Mean Time Spent in Viewing Each of Four Areas of Chart A by Members of 
Six Student Groups (Based on Data in Table 2) 


Source 


Advertising Design 


Tota! 
Between areas 
Among individuals 
Foundation Art Tota! 
Between areas , 

Among individuals 
Art Age 15-17 


Between areas 


Total 


Among individuals 
Art Age 12-15 
Between areas 
Among individuals 
Electrical Engineering Total 
Between areas 
Among individuals 
Home Economics Total 
Between areas 
Among individuals 
Area I 


Between groups 


Tota! 


Among individuals 
Area II 
Between groups 
Among individuals 
Area III 
Between groups 
Among individuals 
Area IV 
Between groups 
Among individuals 
otal 
Between groups 
Within groups 
\reas 
Student groups 
Interaction 
Between groups F equals 8.5** 
Areas I equals 11,2** 
Groups F equals 0.0 


These data, which 


proportion of 


the ratio of 23.9:2.8. 
indicate that a 
variability is due to differences between the 


substantial 


groups, are significant at the .001 level. A 
variance ratio of 22.1:3.7 for Table 6 is equally 
significant and supports the hypothesis that 
the various groups are (with exceptions noted 
by a more detailed analysis of variance and F 


Mean 
Square 


Sum of 
Squares 


~ Ww 
o- 


59.83 
17.16 


tm Ww 
nS 


—w Wow Ne ew 


uw 


test found in Tables 3 and 6) representative of 
different populations. 

In viewing Chart A (Table 1) first year art 
student~ spent the greatest per cent of time 
(fifty jer cent of total time) in viewing area 
IV, advertising design students spent 46 per 
cent of their time in that area, and were followed 
in order by the 15-17 age group, tie 12-14 age 








group, engineering students and home eco- 
nomics students. It will be noted that the 
young art groups (age 12-14 and 15-17) both 
devote more time to the complex areas than 
do the older third year college engineers and 
home economics majors. It is indicated in 
this table that progressively more time is spent 
in viewing the complex areas by the art student 
and by the non-artist, but that the rates of 
progression are substantially greater for the 
art student. The same general pattern holds 
for Table 4 (per cent of time spent in viewing 
Chart B) except that here it is indicated that 
the engineer spends a somewhat greater pro- 
portion of time than the first year art student 
in viewing the more complex area III. The 
comparatively simple area I demands less at- 
tention by all except the secondary art student 
group. In this instance age (contrary to 
Buswell’s findings) rather than artistic sophis- 
ication might be the determinant 

Tables 2 and 5 give mean time in seconds 
spent by each group in viewing Charts A and 
B and give the same information (in different 
form) as Tables 1 and 4. Tables 2 and 5 
provide a basis for a determination of signifi- 
cance of differences and for analysis of variance. 

Tables 7 and 8 indicate that initial eye fixa- 
tions are primarily established in tne upper 
areas (areas I and III), and are primarily 
directed to the right (observer's right). This 
is contrary to the findings of Brandt, and con- 
trary to the generally accepted belief that 
initial eye fixations, determined by reading 


Table 4 


Per Cent of Time Spent in Viewing Each of Four Areas 

of Design Chart B by Members of the Following Stu- 

dent Groups: (1) Third Year Advertising Design; 

(2) Third Year Electrical Engineering; (3) Foundation 

First Year) Art Students; and (4) Students Age 15-17 
Taking Saturday Art Classes 


Per Cent of Time in Each Area 


Area Area Area Area 


Group N 1 2 3 4 
Advertising Design 8 11 43 16 30 
Electrical 

Engineering 9 14 35 19 32 
Foundation Art 14 14 32 20 
Saturday Art 

Age 15-17 11 23 29.5 18 29.5 
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Table 5 


Mean Time (in Seconds) Spent in Viewing Each of Four 

Areas of Design Chart B by Members of the Following 

Student Groups: (1) Third Year Advertising Design; 

(2) Third Year Electrical Engineering; (3) Foundation 

(First Year) Art Students; and (4) Students Age 15-17 
Taking Saturday Art Classes 


Mean Time in Each Area 


Area Area Area Area 
) 


Group N 1 2 3 4 Total 
Advertising 
Design S ier + Ee Da FS 2.0 
Electrical 
Engineering 9 119 288 1.38 206 2.0 
Foundation Art 14 1.87 2.25 138 2.5 2.0 


Saturday Art 
Age 15-17 11 


Total 42 1 


te 
a” 


2.89 1.28 2.58 2.0 


habits, are most frequently directed to the 
observer’s left. It will be noted that the non- 
art groups and the younger art groups tend to 
make more initial fixations to the left (area I). 
Two explanations are suggested: (1) The indi- 
vidual who is less sensitive visually will be 
dominated by reading habits while the visually 
sophisticated will be more attentive to visual 
forms and less dominated by reading habits or, 
(2) The artistic individual (or the visually 
sensitive individual) will be less likely to have 
developed strongly established reading habits 
and is therefore less dominated by habits which 
stem from reading. The latter view is in con- 
ormity with other data which suggests that a 
slight negative correlation exists between per- 
formance in art, as measured by grades in art 
courses, and verbal ability, as measured by 
the L score of the American Council on Edu- 
cation Psychological Examination (7). 

Tables 9 and 10 demonstrate that the more 
artistically sophisticated groups tend to make 
fewer total fixations and tend to make a pro- 
portionately greater per cent of fixations in 
the more complex areas. The pattern of re- 
sponses of the 12-14 age group indicates fewer 
fixations than is found for the 15-17 group. 

Tables 3 and 6 give analysis of variance of 
the mean time spent in viewing each area by 
each group. Table 3 is based on means pro- 
vided in Table 1 derived from responses to 
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Table 6 


Analysis of Variance of Mean Time Spent in Viewing Each of Four Areas of Design Chart B by Members of 
Four Student Groups (Data from Table 2) 


° 2 ty Sun: of Mean 
Source of Variation df Squares Squares 


Advertising Design Total 31 256 
Between areas 3 135 
Among individuals 28 121 

Electrical Engineering Total 31 188 
Between areas 3 68 
Among individuals 28 119. 

Foundation Art Total 31 238 
Between areas 3 105 
Among individuals 28 132.8 

Art Age 15-17 2 31 64 
Between areas 3 
Among individuals 28 

Area I Total 31 
Between student groups 3 
Among individuals 28 

Area I 31 
Between student groups 3 
Among individuals 28 

Area LI Total 31 
Between student groups 3 
Among individuals 

Area IV Total 31 
Between student groups 3 


NR 
oo 


Among individuals 
otal 
Between groups 


— = WD 
ewwn un 


Within groups 
Areas 
Student occupations 


< 


Interaction 


Between groups F equals 6.** 
Areas F equals 8.7* 
Groups F equals 0 


Table 7 Table & 


Per Cent of Initial Fixations Made in Each of Four Per Cent of Initial Fixations Made in Each of Four 
Areas of Design Chart B by Members of Areas of Design Chart A by Members of 
Four Student Groups Six Student Groups 





Advertising Design ‘ Advertising Design ‘ 25.0 00 375 
Foundation Art Foundation Art 10.0 5.0 700 
Electrical Engineering 5: $ Saturday Art 15-17 36.3 0.0 45.6 
Saturday Art 15-17 3 Saturday Art 12-14 33.3 0.0 444 

Electrical Engineering 444 112 444 
Total 2 7 7 Home Economics 50.0 00 33.3 
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Table 9 


Per Cent of Eye Fixations Made in Each of Four Areas 
of Design Chart B by Members of Four Student 
Groups and Mean Number of Fixations 
for Each Group in Each Area 


Area 


Group : 4 


27.4 


Advertising % 


Design mean N 


Electr. % 


Engineer. mean N 


Foundation % 

Art mean N 

Saturday % A 21. 
) 


Art 15-17 meanN 2.36 


Chart A. Table 6 is based on analysis of 
variance of means in Table 3 and is derived 
from responses to Chart B. In general there 
is greater variation between area responses 
than among individuals in the groups. Home 
economics and engineering students tend to be 
most homogeneous in their visual patterns (less 
among individuals variance) and tend to view 


Table 10 


Per Cent of Eye Fixations Made in Each of Four Areas 
of Design Chart A by Members of Six Student 
Groups and Mean Number of Fixations 
for Each Group in Each Area 


Area 
Group 
oc 


Advertising % 


Design mean N 


Foundation % 
70 
Art mean N 


Electr. % 


Engineer mean } 


Siturday % 
Art 15-17. mean N 


Saturday % 


Art 12-14 


mean N 


Home % 
Economics mean N 


the charts in a more homogeneous manner (less 
between areas variance) than do art students. 
Foundation art students are homogeneous as a 
group, but exhibit great variability in their 
response to areas. Only in the instance of the 
12-14 age group do we find that variability due 
to differences among individuals within the 
group is greater than the variability due to 
differences in area complexity. 

As to the variability which is brought about 
by differences between areas it may be noted 
that area IV (Table 3) draws the most variable 
response, both between the groups and among 
individuals within the groups, while the less 
complex areas I and III draw a much more 
uniform response. Variability due to differ- 
ences among individuals is less than the vari- 
ability due to differences between groups, ex- 
cept in the instance of area IV, wherein the 
among individuals variance is greater than the 
between groups variance. 

In the assignment of total variance for all 
groups, between groups variance is substanti 
ally larger than within groups variance. The 
greater part of total variance must be assigned 
to differences between areas. 


Summary 


The present experiment attempted to arrive 
at some preliminary conclusions regarding the 
influence of design complexity in determining 
visual attraction or attention. It is indicated 
that art students tend to devote a greater pro- 
portion of their observation time to the more 
Variance 
due to complexity of design is significant for 


complex areas than do non-artists. 


the more sophisticated art groups, as are the 
differences in mean time spent in viewing the 
designs. However, significant differences are 
not found for the less sophisticated groups, 
indicating that sensitivity to design com- 
plexity is a developmental process which in- 
creases with age and with level of artistic 
sophistication. The general pattern of data 
support the hypotheses that: (1) differences in 
visual sensitivity or visual awareness and at- 
tention do exist between artists and non-artists 
and between age groups; and (2) art groups 
are more sensitive to or pay more attention to 
more complex design units than do non-artists 
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when the factors of color and objectivity have 
been eliminated from the designs. 
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Verbal and Pictorial Questionnaires in Market Research * 


Joseph Weitz 


Carnegie Institute of Technology 


The purpose of this study was to compare the 
results obtained from two different types of 
questionnaires commonly used in market re- 
search. The two questionnaires used in this 
study were a verbal and a pictorial question- 
naire. Frequently in a single survey a ques- 
tionnaire will be used which contains both 
verbal questions and a choice of pictorial items. 
The data obtained from these two types, of 
questions are often treated similarly. It is 
hypothesized in the present study that different 
results may be obtained using these two tech- 
niques. If this is so, it does not seem war- 
ranted to treat the results in the same manner 
nor to evaluate equally the data obtained from 
these two sources 


Procedure 


The subject to be studied in this survey is the 
design of the cooking range. This was chosen 
since it is generally present in some form in 
every household and hence people to be in- 
terviewed would be familiar with it. Two 
questionnaires were compared, one verbal and 
one pictorial, concerning the design of the 
cooking range. The questions on the verbal 
questionnaire were as follows: 


(1) Do you prefer the table top (low oven) 
or the high oven? 
Do you prefer a window in your oven? 
Do you prefer a high or low location for 
your broiler? 
Do you prefer burner controls on the 
back vertical panel or the front panel? 
Which of the following burner arrange- 
ments do you prefer? 


(a) Two burners on each side with work 
space in the center 

(b) Four burners on one side with work 
space on the other side 

(c) Four burners staggered 
entire top 

(d) Four burners across the 
work space across the front 


across the 


back with 


* The author wishes to express his appreciation to 
Mr, David Ellies who did all of the interviewing and 
generally assisted throughout the entire project. 
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(e) Two burners on each side with a built- 
in griddle in the center 


(6) Which of the following do you prefer? 

(a) Oven in the center with storage spac« 
on both sides 

(b) Oven on the right with storage space 
on the left 

(c) Oven on the left with storage space on 
the right 

(d) Double oven 


(e) A high oven with storage space below 


(7) If you had your choice, what color would 
you choose for your stove? 

(8) Do you prefer to have toe space at the 
base of your stove? 

(9) Do you prefer a hinged door or a drawer 
type storage area? 


The visual questionnaire was composed of a 
series of sketches involving the same discrim- 
inations which were asked for in the verbal 
questionnaire. For example, in question three 

do you prefer the high or low location for 
your broiler?—in the pictorial questionnaire 
two drawings were made, one with a high 
broiler and one with a low, both stoves having 
the same basic design (see Figure 1). The 
person being interviewed was asked which of 
these stoves she would prefer. The same was 
done for all of the other questions 

For both groups other information was ob 
tained, such as educational background, num- 
ber of years the individuals had used the par- 
ticular stove they now have, and what type of 
stove they were using at present. In this 
paper only the data pertinent to the original 
hypothesis will be presented. 

The survey was conducted in the city of 
Pittsburgh. The sample consisted of 200 
adult females. This total sample was divided 
into two groups of 100 each; one group re 
ceived the verbal and one the pictorial ques- 
tionnaire. In each sample of 100, 10% were 
from the A or highest socio-economic group; 
30%) from the B socio-economic group; 40% 
from the C socio-economic group and 20% 


from the D socio-economic group. In this way 
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Fie. 1 


Sketches used for question 3 in the 
Pictorial Questionnaire. 


the two groups were matched on the basis of 
socio-economic background.! 

One interviewer was used for all 200 cases. 
This individual was an experienced 
viewer 


inter- 
It was thought necessary to use only 
one interviewer so as to reduce that variable 
to a minimum. In both the verbal and pic- 
torial questionnaires the interviewer returned 
to those addresses where no one was home at 
the first call. Further, in administering both 
the verbal and pictorial questionnaire the in- 
ternal relationships of the sample were main- 


tained during the study. That is, all of the A 


! Manual for research associates and interviewers. New 
York: Market Research Division, The Psychological 
Corporation. Part 2, pp. 5-6 


group were not interviewed before the B, et 
for example, 1 A was interviewed, then 3 B’s, 
then 4 C’s, then 2 D’s, etc., so that no one group 
was completed before starting the next 


Results 


The results of the study are shown in Table 1. 
It can be seen that in all cases, with the ex- 
ception of question two, there was a significant 
difference at least at the 1% level as deter- 
mined by the Chi-Square test. 

On several items there was a complete re- 
versal of the preferred response from the visual 
to the pictorial questionnaire. In question 
seven, concerning color preference, the colors 
other than white were grouped in one category 
and white in the other category, giving a two 
by two table rather than a five by two table, 
which would have been the case had all the 
colors been used (black, green, cream, blue and 
white). 

Previously it was stated that even though no 
attempt was made to equate the two groups 
on the basis of educational background, they 
turned out to be quite similar. This can be 
seen from Figure 2. Since there would be 
fewer than five cases in some cells if Chi-Square 
were computed separately for each socio- 
economic group, all groups were combined to 
give the total educational level of each sample. 
From this combination of data the computed 
Chi-Square results in a value not significant 
at the 30% level. This then is an added check 
on the homogeneity of the two samples used 
and would lend weight to the assumption that 
any differences which exist are due primarily 
to the type of questionnaire used. It is of in- 
terest to note the change in educational back- 
ground from the A group to the D group. 

It can be seen from Table 1 that the results 
obtained from the pictorial and the verbal 
questionnaire cannot be considered as homo- 
Since the sampling technique was 
identical for both samples and since there was 
evidence of homogeneity of the educational 
background of the two groups, it would seem 
evident that the differences observed are due 
to the differences in the questionnaire tech- 
nique. 
substantiated 


geneous. 


If this is so, the original hypothesis is 


and one must differentially 


evaluate results obtained in these two ways. 
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Question 
High ie ted oven 
Window in oven 
Broiler location 


Burner controls 


Burner arrangement 


Oven arrangement 


Color 
Toe space 


Storage 


GROUP 
EDUCATIONAL BACKGRC 


o% 


1-8 


Table 1 


Preferences of Interviewees in Each Sample 


Number Preferring 


Choice 


High 

Low 

Yes 

No 

High 

Low 

Front 

Back 

2 each side 
4 one side 

4 across top 
4 across back 
2 each side griddle 
Center 

Right 

Left 

Double 

High 

White 

Not white 

Yes 

No 

Drawer 

Hinge 


Chi-Square 
Value P 


O10 
300 to .500 
O10 
10.98 ool 


15.06 010 


VERBAL. mB PICTORIAL 


Fic. 2. Educational background of each socio-economic group in each sample 
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The present study throws no light on the 
important problem of which of these two 
methods is more valid. That is, if a survey is 
to predict consumer behavior it should ob- 
viously be important to know which of these 
techniques, pictorial or verbal questionnaire, 
comes closer to actual buying behavior. Fur- 
ther research must be done in order to compare 
the usefulness of these two techniques with re- 
spect to their predictive value for consumer 
behavior. 

It should be pointed out that it is possible 
that had other pictures been used there might 
have been different results; therefore the pic- 
torial representation itself might be studied to 
determine the amount of variability obtained 
with various forms of visud! presentation of 
* questions. 


Summary 


Two different questionnaires were adminis- 
tered to two samples of 100 each. These 
samples matched for socio-economic 
status and were homogeneous with respect to 


were 
educational background. One group received 
a pictorial questionnaire, one group received 
a verbal questionnaire. Statistically signifi- 
cant differences were obtained between the two 
techniques. It is concluded from this study 
that one cannot use these two questionnaire 
techniques interchangeably and that the data 
obtained from these two methods should not 
be equally evaluated. 


Received November 7, 1049 





An Exploratory Study of Linear Interpolation 


Harry Kreiger Miller, Jr. 


Lehigh University 


The psychological problem of estimating the 
position of a point relative to two markers be- 
tween which it falls is of vital importance in 
Until re- 
cently there has been comparatively little sci- 
entific research published on this phase of inter- 
polation. 


studies of dial and scale reading. 


Work which has been carried out 
has indicated that there are three explanations 
for the errors which occur in such interpola- 
tions: (a) individual differences; (b) size of the 
interpolated interval and of the markers; and 
(c) biases. 

Recognition that indjvidual differences play 
an important part in this problem is indicated 
in a study on dial readings by Kappauf, Smith, 
and Bray (7) when they came to the conclusion 
that “subject differences and subject inter- 
actions appear to demand an analysis of the 
data subject by subject rather than analysis 
of group averages.” In another study (9) on 
interpolation between circular scale markers, 
Leyzorek found that “individuals differ signifi- 
cantly in their abilities to perform this kind of 
visual interpolation.”’ 

That accuracy of interpolation varies ac- 
cording to the size of the interval has been ob- 
served in most of the studies on this subject, 
including the two just mentioned (7) and (9). 
Grether and Williams (5) concentrated on this 
factor in their experiment on accuracy of dial 
reading, and in another study (8) on dial size 
and graduation Kappauf and Smith found that 
frequency of dial reading errors was primarily 
a function of the size of the scale unit. The 
size of the markers also has some efiect, as is 
mentioned by Bickstrém (1) and some of the 
aforementioned. 

Biases—or errors not randomly distributed 

are the third category of causes of errors. 
Chapanis (3) found that whether the position 
of a point was underestimated or overestimated 
varied systematically with its location. Bart- 
lett, Reed and Duvoison (2) and Leyzorek (9) 
found some evidence of the same type of 
biases. Reed and Bartlett (10) and Harriman 


Bartlett (6) studied these biases more 
specifically in experiments on concentric rings 
and positions along a short line, respectively. 
Bickstrém (1 
of subjects found that biases were of greater 
influence than random 


discrepancies 


and 


in studies on a limited number 


errors as a cause ol 


Purpose 


The purpose of this research was to system 
atically study the accuracy of visual interpola- 
tion with five different interval sizes, each 
having interpolated positions 1 through 9, 
using twenty-one subjects of differing age and 
occupation. 


Prox edure 


Figure 1 shows a pattern of the 2 mm. size. 
Six different patterns of problems were drawn 
up originally using an interval of 10mm. Each 
pattern consisted of 54 problems, of which there 
were six each of the nine different positions 
(1, 2, 3, 4, 5, 6, 7, 8, 9) randomly arranged. 

Direct prints were made of the original six 
patterns thus giving the 10 mm. interval 
Photographic used for the 
5 mm., 3mm., 2 mm., and 1 mm. interval sizes 
These were then placed on ordinary photo 
graphic mounting board 


reductions were 


A 


15 

2 

27 | 28 

33 | 34 

39 | 40 

45 | 46 47 


6S USS 


Sample pattern of 2 mm. size 
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Slotted shields were used to permit viewing 
one horizontal row of problems at a time. 
Mimeograph answer sheets were provided. 

Each subject was given one size of pattern 
at a time with the corresponding shield and 
answer sheets. He was asked to estimate in 
tenths the position of the inner line in relation 
to the two outside markers and to enter his 
judgment on the appropriate place on the 
answer sheet. He was told that the inner line 
always fell on an exact tenth (i.e., 1, 2, 3, 4, 5, 
6, 7, 8, or 9). He was permitted to work at 
his own speed but was asked to be as accurate 
as possible with each interpolation. Use of the 
shield was optional, so that it was possible for 
him to compare problems against each other. 
Judgments were made at normal reading dis- 
tance and no attempt was made to control 
lighting. No measurement of the subject’s 
visual acuity was made. 

When the subject finished one set he was 
given the opportunity of doing another size 
immediately or of resting his eyes before at- 
tempting the next group. The sets were not 
given in any systematic order, but rather the 


Harry Kreiger Miller, Jr. 


subject was given his choice as to what order 
he desired to work on them. However, each 
individual completed all the problems of one 
size before being given the next set. 

There were 520 judgments (6 patterns of 54 
problems each) for each interval size, giving 
a grand total of 1,620 estimations by each 
subject : 


Results 


A tabulation of the total number of errors 
according to the size of the interval is made 
in Table 1. (Occasional mistakes due to 
reversal in recording—i.e., placing a 1 on the 
answer sheet for a 9, or a 2 for an 8—were not 
counted as errors of interpolation.) 

The figures indicate that individual differ- 
ences are of primary importance, since the 
totals extend from 3 errors for the most ac- 
curate subject to 538 errors for the least 
accurate. 

Neither occupation nor sex is indicated as a 
major controlling factor. Of the four subjects 
making the fewest errors, two are males and 
two are females, one is a graduate engineering 


Table 1 


Number of Errors According to Size of Interval 


Subject Occupation 


- 
Bc] 
e 


RH Grad. Eng. Stud. 

FH Housewife 

GD Stud 

LO Undergrad. Eng. Stud 


Grad. Educ 


RD Undergrad. Eng. Stud 

FB Housewife 

HK : Grad. Psych. Stud. 

RR IBM Operator 

GB College Registrar 

FF Grad. Psych. Stud 

BG 7th Grade Stud 

EH Undergrad. Psych. Stud 

ME Office Clerk 

jc Housewife 

WM 

HM Accounting Clerk 

DS Housewife 

ES WAVE Officer 

FI 7th Grade Stud 
Office Clerk 

Wi Secretary 


we VEN ee YI 


a 


tw Ooenw ee Ue 


Undergrad. Bus. Stud 


ZRNe NK HN Dh NNN DD 


- Ww rl 
ww oS O&O 


Size of Interval 


imm. 2mm. 3mm. 5mm. 10mm. 


Total 


0 


ens ee we 
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Table 2 
Number of Errors According to Interpolated Position 
Note: L = Lower than true position; H = Higher than true position 


Interpolated Position 


4 5 
Subject 


BG 

EH 

ME 

JC 

WM 

HM 

DS 

ES § 

FL ; ZB 18 
BH ; . , of 
Wi 2 2 13 


student, one is a housewife, one a graduate edu 
cation student, and one a second year under- 
graduate engineering student. Of the four sub- 
jects making the most errors, three are females 
and one a male, all of different occupations. 

Although most of the subjects are in the age- 
group of 18 through 28, the spread in total 
number of errors of the individuals outside this 
group lends credence to the inference that age 
is not a primary factor in accuracy. 

In comparing the number of errors according 
to the size of the interval it was again found 
to be a matter of individual differences. A\l- 
though the highest number of errors were made 
on the 1 mm. size by eleven of the subjects, 
there were two who had the greatest number 
of mistakes on the 2 mm. size, three who had 
the greatest inaccuracy on the 3 mm. size, two 
experienced the most difficulty on the 5 mm. 
intervals, and one who found the 10 mm. the 
most difficult for correct interpolatior ; in 
addition, one subject had an equal number of 
errors on the 2 mm. and 5 mm., and one found 
the 1 mm. and 3 mm. equally difficult. These 
data apparently point up the fact that there 
is no one optimum size for all subjects. 

When the data tabulated to 
possible bias due to position, the first ten sub- 


were show 
jects in Table 1 showed so little error that there 
However, the last 
eleven subjects did show some trend and their 
results are included in Table 2. 


was no indication of bias. 
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number of errors is found on the 1 and 2, and 
the 8 and 9 positions, the extremes of the scale. 
The 5 position, the center of the scale, had com- 
paratively few errors. The maximum number 
of misjudgments were made on the 4 and 6 
positions. The 4’s tended to be read as 3’s, 
and the 6’s as 7’s, indicating a bias outward 
from the center. 
8’s, and the 3’s as 2’s. 

Individual differences, are ap- 
parent. For example, subject BH overesti- 
mated the 3's, 4’s, and 5’s, and underestimated 
the 7’s. Subject WJ made more mistakes on 
the 1’s and 2’s than on any of the others, 
most 


The 7’s tended to bé read as 


however, 


whereas subiects were comparatively 
Subject JC was 
least accurate on the 5’s. Some subjects, for 
example EH, HM, BH, and WJ, made more 
overestimations, while others, as ME, JC, WM, 
ES, greater number of 


accurate on these positions. 


had a under 


estimations. 


and 


Although the small sample of subjects does 
not permit generalizations, the 
unusually high level of performance of some 
of the individuals is worthy of note. Four 
had less than 4 of 1% errors, and the majority 
of subjects had less than 6% errors. 


statistical 


No mis 
judgments were made on the 1 mm. size by 
the two most accurate subjects; this means 
that 


tenth of a millimeter 


they discriminated differences of one 


Although some general trends are evident, 
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the individual is paramount above the size oi 
the interval. For most subjects it became in- 
creasingly difficult to interpolate as the interval 
sizes decreased. Nevertheless, in some cases 
there was no decrease in accuracy and a few 
subjects did better at the smaller sizes. 


Summary 


In this study twenty-one subjects of differing 
age and occupation visually interpolated five 
interval sizes (1 mm., 2 mm., 3 mm., 5 mm., 
and 10 mm.), each size having 324 problems. 
The problems were randomly arranged in six 
different patterns, each pattern containing an 
equal number of the interpolated positions 1 
through 9. The subjects were asked to be as 
accurate as possible and were given as much 
time as they needed to complete each interval 
size. 

Eight of the subjects made less than thirty 
errors on the 1,670 problems. Results indi- 
cated that individual differences were of greater 
influence than interval size or biases due to 
position differences. 


Received November 7, 1049 
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Book Reviews 


Libo, L.M. Attitude prediction in labor rela- 
tions—A test of understanding. Studies in 
Industrial Relations, No. 10, Division of In- 
dustrial Relations, Graduate School of 
Business, Stanford University, 1948. Pp. 
15. $1.00. 


lhe thesis of this monograph is that one first 
step to good industrial relations is the under- 
standing on the part of labor and management 
of the points of view of the other party. Such 
understanding together with willingness to 
understand, ability to learn, and availability 
of the other party builds improved inter- 
personal and intergroup relationships. The 
extent of this understanding can be measured 
by comparing for each party its beliefs con- 
cerning the attitudes of the other party with 
the actual views of the other party. The 
study, then, essentially has two parts--the 
arguments concerning the major thesis and an 
illustrative attempt to measure the degree of 
understanding between labor and management. 

It is apparent that this study holds that the 
keynote to improved labor-management rela- 
tions is clarification. One might expect, there- 
fore, that both concepts and_ terminology 
would be considered quite rigorously. While 
the author has striven mightily in this direc- 
tion, in the reviewer’s opinion he has not 
wholly succeeded in hitting his mark. Libo’s 
concept of understanding appears to change 
from time to time. In some instances it seems 
to refer simply to the accuracy with which one 
party perceives the point of view of the other 
while in others it seems to imply acceptance of 
those points of view. Similarly in certain 
parts of the discussion understanding appears 
to be a necessary condition for improvement of 
labor-management relations and in other parts 
this position is denied. It is never quite clear 
whether understanding is concerned with 
knowledge of the attitudes, motivations, etc., 
of the other party or the prediction of the most 
likely behavior under given conditions. 

As an illustration of the way in which Libo 
believes understanding can be measured, he 
administered some 24 opinion questions con- 


cerned with labor-management issues to a 


group of 44 labor leaders and to a group of 33 
industrial relations directors. The members of 
each group not only indicated their opinions 
but also those that they would expect of the 
other group. Through an analysis of the re- 
sponses one is able not only to observe similari- 
ties and differences between the opinions of the 
two groups but also to observe the accuracy 
with which each group can predict—i.e,, per- 
ceives—the opinions of the other. 

On all but two of the 24 issues the two parties 
expressed opposite opinions. Management's 
predictions of the opinions of labor were found 
to be considerably more accurate than labor’s 
predictions of the opinions of management. 
Generalizations of these findings are particu- 
larly hazardous in view of the fact that the two 
groups utilized do not have direct dealings with 
each other. 
resentatives of the International Longshore- 
men’s and Warehousemen's Union (CIO), and 
the management group were members of the 
California Personnel Management Association 
and each was from a different company. 

On the positive side it can be said that the 
entire monograph is most stimulating. It is 
fruitful of ideas and should provoke further 
interesting discussion and research. Those 
interested in problems connected with such 
studies undoubtedly can profit from Libo’s 
analysis of the problem of measuring under- 
standing. 

For those who may wish to refer to this 
monograph, it will be noted that the date is not 
cited. The reviewer has been informed that 
the date of publication is August 1948. 

Edwin E. Ghiselli 


The labor group consisted of rep- 


University of California, 
Berkeley, California 


Van Der Lugt, M. J. A. V. D. L. Adult psy- 
chomotor test series for the measurement of 
manual ability. New York: New York Uni- 
versity, 1948 (mimeographed); V. D. L. 
Psychomotor test series for children for the 
measurement of manual ability. New York: 
New York University, 1948 (mimeographed). 


This series of ten relatively simple tests “for 
the measurement of manual ability” requires 
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only a small amount of test equipment and 
simple instructions for administration and 
scoring. The reliabilities for each test are 
said to be “over .90.” Five basic components 
of speed, pressure, accuracy, motor memory 
and coordination are each represented by two 
tests, as follows: 1. Speed-prehension; 2. Speed- 
asynkinesia; 3. Pressure-reproduction; 4. Pres- 
sure-control; 5. Accuracy-steadiness; 6. Ac- 
curacy-precision; 7. Motor memory-direction; 
8. Motor memory-spatial; 9. Coordination- 
static; and 10. Coordination-dynamic. 

Thus far it would seem that the battery 
fulfilled most of the requirements of useful 
measuring devices, since it samples most of the 
basic aspects (speed, etc.) of manual activities, 
is simple, objectively scored, and has a reason- 
able number of cases for its tentative norms. 

However the basic problem of psychomotor 
testing has nearly always been one of validity, 
or “what the tests purport to measure.” 


Ideally a psychomotor test battery should 
measure represeitative samples of each com- 
ponent of motor action, e.g., speed, precision, 
etc., so that the subscores could be used to de- 
scribe or predict an individual’s skill or apti- 
tude for each component of any complex skill 


which he might be expected to develop. 

Several coefficients of validity are cited, cor- 
relations with five point ratings by employers 
of skill on the job in small factories: r =.73 for 
women and .65 for men on the static coordina- 
tion test, and .51 for women and .58 for men 
on the dynamic coordination test. Unfor- 
tunately the numbers of cases are described 
“small” and therefore cannot be 
evaluated as to their representativeness. 

The only adequate tests of validity for these 
tests would be a series of correlations with a 


only as 


variety of practical manual performances, each 
on groups of persons large enough to permit 
known statistical significance and with criteria 
of known reliability. Fine (as contrasted with 
gross) psychomotor tests have long been known 
to be quite specific in nature or related only 
within narrow group factors of other tests, and 
the validity of fine motor tests for the predic- 
tion of manual skills has been adequately 
demonstrated for only a few practical skills 
(air crew specialists and rifle marksmen). 

In view of the above facts the reviewer may 
be pardoned for urging a very strong skepticism 
as to the appropriateness of naming the V.D.L. 
battery a psychomotor test series “for the 
measurement of manual ability.” The re- 
viewer’s own research strongly suggests that 
various components such as speed of fine 
motor skills are not primarily determined by 
any stable biological characteristic of the indi- 
vidual so much as they are determined by his 
happening to hit upon qualitatively different 
work methods which are often quite specific to 
each skill studied (cf. Seashore, R. H. Theo- 
retical and experimental analyses of fine motor 
skills. Amer. J. Psychol., 1940, 53, 86-98). 

It is to be hoped that Dr. Van Der Lugt will 
soon be able to furnish more definite evidence 
as to the validity or non-validity of the battery 
or its subtests for the prediction of a repre- 
sentative series of specific practical manual 
skills. Likewise the anticipated value of the 
battery for the appraisal of educational, clinical 
or developmental status of the individuals 
should be experimentally verified before we are 
justified in assuming this type of validity for 
the tests. 

Robert H. Seashore 


Northwestern University 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Donald G. Paterson, Editor, 
Department of Psychology, University of Minnesota, Minneapolis 14, Minnesota 


Beginning experimenial psychology. S. Howard Bartley 
New York: McGraw-Hill Book Co., Inc., 1950. Pp 
483. $4.00. 

Teaching the child to read 
Wagner 
millan Co., 1950. 


Guy L. Bond and Eva Bond 
Revised edition. New York: The Mac 
Pp. 467. $3.75. 

1 history of experimental psychology. Second edition 
Edwin G. Boring. New York: Appleton-Century 
Crofts, Inc., 1950. Pp. 775. $6.00 

Developing men for controllership. T. F 
Boston: Division of Research, Harvard 
School, 1950. Pp 232. $3.25 

Selected readings in social psychology 
son Britt, Editor. New York 
Inc., 1950. Pp. 507. $2.00. 

Recent experiments in psychology. Second edition 
Leland W. Crafts, Elsa E. Robinson, Theodore C 
Schneirla, and Ralph W. Gilbert. New York: Mc- 
Graw-Hill Book Co., Inc., 1950. Pp. 491. $3.50 

Rating em ployee and supervisory performance. M. Joseph 
Dooher and Vivienne Marquis, Editors. New York: 
American Management Association, 1950. Pp. 192 
$3.75. 

Handbook of employee selection. Roy M. Dorcus and 
Margaret H. Jones. New York: McGraw-Hill Book 
Co., Inc., 1950. Pp. 349. $4.50. 

S. Weir Mitchell, novelist and physician. Ernest Earnest 
Philadelphia: University of Pennsylvania Press, 1950. 
Pp. 278. $3.50. 

Be your real self. David Harold Fink. New 
Simon and Schuster, Inc., 1950. Pp. 307. $2.95 

J. Roswell Gallagher. Chicago: 

1950. Pp. 48. 


Bradshaw 
Business 


Steuart Hender 
Rinehart and Co., 


’ 


York 


You and your health 
Science Research 
$.60. 

Studies in leadership: leadership and democratic action 
Alvin W. Gouldner, Editor. New York: Harper and 
Brothers, 1950. Pp. 768. $5.00 

The theory of mental tests. Harold Gulliksen. New 
York: John Wiley and Sons, Inc., 1950. Pp. 462. 
$6.00. 

General clinical counseling. Milton E. Hahn and Mal 
colm S. MacLean. New York: McGraw-Hill Book 
Co., Inc., 1950. Pp. 373. $3.50. 

Counseling the handicapped in the rehabilitation process 
Kenneth W. Hamilton. New York: The Ronald 
Press Co., 1950. Pp. 296. $3.50. 

Chats with teachers about counseling. S. A. Hamrin 
Bloomington, Ill.: McKnight and McKnight Pub 
lishing Co., 1950. Pp. 224. $3.00. 

How to use psychology for better advertising. Melvin S. 
Hattwick. New York: Prentice-Hall Inc., 1950 
Pp 376 $5 75. 


Associates, Inc 


handicapped. Sara 
Stanford University 


Speech therapy for the physically 
Stinchfield Hawk 
Press, 1950. $4.00 

Testing results in social casework. J. McV. Hunt, 
Margaret Blenkner, and Leonard S. Kogan. New 
York: Family Service Association of America, 1950 
Pp. 64. $2.00 

Measuring results in social casework. J. McV. Hunt 
and Leonard S. Kogan. New York: Family Service 
Association of America, 1950. Pp. 79. $1.50 

idolescent development. Elizabeth B. Hurlock. New 
York: McGraw-Hill Book Co., Inc., 1950 Pp 566. 
$4.50 

The principles 
York: Dover 
$7.50. 


Stanford 


New 
700. 


William James 
1950. Pp. 


of psy hology 
Publications, Inc . 


Psychology in everyday living. Ralph L. Johns. New 
York: Harper and Brothers, 1950. Pp. 564. $3.50. 

{ comparison of diagnostic and functional casework con- 
cepts. Cora Kasius, Editor. New York: Family 
Service Association of America, 1950. Pp. 169. 
$2.00 

Tensions affecting 
Klineberg 
cil, 1950 


international understanding. Otto 
New York: Social Science Research Coun- 

Pp. 227. Cloth, $2.25; Paper, $1.75 

The growth and development of executives. Myles L. 
Mace. Boston: Division of Research, Harvard Busi- 
ness School, 1950. Pp 200 $3 25 

Rollo May. New York: The 
Pp. 376. $4.50. 

Mildred B. Parten. New 
York: Harper and Brothers, 1950. Pp. 624. $5.00. 

The criminality of women. Otto Pollak. Philadelphia: 
University of Pennsylvania Press, 1950. Pp. 180 
$3.50 

The Porteus 
Porteus 
$4.00. 

Psychology, a biosocial study of behavior. E 
Prothro and P. T. Teska. Boston 
1950. Pp. 546. $3.75 

Psychological problems in mental deficiency. Seymour 
B. Sarason. New York: Harper and Brothers, 1950 
Pp. 365. $5.00. 

Religion and the cure of souls in Juny’s 
Hans Schaer. New York: Pantheon 
1950. Pp. 221. $3.50 

Occupational information. Carroll L. Shartle. New 
York: Prentice-Hall, Inc., 1950. Pp. 339. $3.50. 

Analytic group psychotherapy. S. R. Slavson. New 
York: Columbia University Press, 1950. Pp. 275. 
$3.50 


The meaning of anxiety. 
Ronald Press Co., 1950 


Surveys, polls and samples 


Maze Test and intelligence 
Palo Alto: Pacific Books, 1950 


Stanley D. 


Pp. 194. 


Terry 
Ginn and Co., 


psychology. 
Books, Inc 
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The psychology of mental health. Louis P. Thorpe Management behavior and foreman attitude. David N 
New York: The Ronald Press Co., 1950. Pp. 747 Ulrich, Donaki R. Booz, and Paul R. Lawrence 
$5.00 Boston: Division of Research, Harvard Business 
3 “ Fe : : School, 1950. Pp. 56. $.75 

Educational psychology. William Clark Trow. Boston: De profundis. Oscar Wilde. New York: The Philo 
Houghton Mifflin Co., 1950 sophical Library, 1950. Pp. 148. $3.00 
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ANNOUNCING THE SECOND PRINTING OF 


THE THIRD 
Mental Measurements 


Yearbook 


Edited by Oscar Krisen Buros 


e This latest volume in the well-known YEARBOOK series, now in the 
second printing, contains evaluations by specialists of objective tests 
published primarily from 1940-1947. Included also is a listing of 
methodology books with excerpts of published reviews. (Greater cover- 
age and additional index information are only two of the new features 
of THE THIRD MENTAL MEASUREMENTS YEARBOOK. 


“One of the indispensable books for those who want to know the usefulness, 
validity and reliability of the tests that are being so widely used in many fields.” 
—Educational Administration and Supervision 

“The sum total of information provided by the Yearbook represents, better 


than any other single publication, the present status of Mental Measurement.” 
—Psychosomatic Medicine 


“Oscar Buros and Rutgers University are performing a signal service for the 
whole field of tests and measurements. The Third Mental Measurements 
Yearbook is a musi book for every personnel psychologist worthy of the name.” 
—Donald G. Paterson in Personnel Psychology 





A PARTIAL LIST OF CONTENTS: 


tests Periodical Directory and Index 
test reviews written for Publishers Directory and Index 


excerpts of published book Classified Index of Tests 
reviews 

book and journal refer- Index of Names 

ences on the construction, 

use, and validity of spe- 

cific tests 


1063 pages, 7}’’ x 10°’, $12.50 
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